Processing Module ⚙️
API reference for the memory::processing module, which provides core data processing functionality for single-cell RNA-seq analysis.
Normalization
normalize_expression
Normalizes expression data to a target count per observation.
pub fn normalize_expression(
matrix: &IMArrayElement,
expression_target: u32,
direction: &Direction,
precision: Option<Precision>
) -> anyhow::Result<()>
Parameters
matrix: Expression matrix to normalizeexpression_target: Target sum for normalization (typically 10,000)direction: EitherDirection::Row(normalize cells) orDirection::Column(normalize genes)precision: Optional floating-point precision (Singlefor f32,Doublefor f64)
Returns
anyhow::Result<()>: Success or error
Example
normalize_expression(&matrix, 10000, &Direction::ROW, Some(Precision::Single))?;
log1p_expression
Applies natural logarithm transformation after adding 1 (log1p) to the data.
pub fn log1p_expression(
matrix: &IMArrayElement,
precision: Option<Precision>
) -> anyhow::Result<()>
Parameters
matrix: Expression matrix to transformprecision: Optional floating-point precision (Singlefor f32,Doublefor f64)
Returns
anyhow::Result<()>: Success or error
Example
log1p_expression(&matrix, None)?;
Filtering
mark_filter_cells
Creates a boolean mask indicating which cells pass all specified filtering criteria.
pub fn mark_filter_cells<I, T>(
anndata: &IMAnnData,
min_genes: Option<I>,
max_genes: Option<I>,
min_counts: Option<T>,
max_counts: Option<T>,
min_fraction: Option<T>,
max_fraction: Option<T>
) -> anyhow::Result<Vec<bool>>
where
I: PrimInt + Unsigned + Zero + AddAssign + Into<T>,
T: Float + NumCast + AddAssign + Sum
Type Parameters
I: Integer type for counting genes (typicallyu32)T: Floating-point type for counts and fractions (typicallyf64)
Parameters
anndata: Reference to AnnData objectmin_genes: Minimum number of genes expressed required for a cellmax_genes: Maximum number of genes expressed allowed for a cellmin_counts: Minimum count total required for a cellmax_counts: Maximum count total allowed for a cellmin_fraction: Minimum fraction of total genes that must be expressed in a cellmax_fraction: Maximum fraction of total genes that can be expressed in a cell
Returns
anyhow::Result<Vec<bool>>: Boolean vector wheretrueindicates cells that pass all filters
Example
let cell_mask = mark_filter_cells::<u32, f64>(
&adata,
Some(200), // Minimum genes
Some(5000), // Maximum genes
Some(500.0), // Minimum counts
None, // No maximum counts
None, None // No fraction thresholds
)?;
mark_filter_genes
Creates a boolean mask indicating which genes pass all specified filtering criteria.
pub fn mark_filter_genes<I, T>(
anndata: &IMAnnData,
min_cells: Option<I>,
max_cells: Option<I>,
min_counts: Option<T>,
max_counts: Option<T>,
min_fraction: Option<T>,
max_fraction: Option<T>
) -> anyhow::Result<Vec<bool>>
where
I: PrimInt + Unsigned + Zero + AddAssign + Into<T>,
T: Float + NumCast + AddAssign + Sum
Type Parameters
I: Integer type for counting cells (typicallyu32)T: Floating-point type for counts and fractions (typicallyf64)
Parameters
anndata: Reference to AnnData objectmin_cells: Minimum number of cells expressing a genemax_cells: Maximum number of cells expressing a genemin_counts: Minimum count total required for a genemax_counts: Maximum count total allowed for a genemin_fraction: Minimum fraction of total cells that must express a genemax_fraction: Maximum fraction of total cells that can express a gene
Returns
anyhow::Result<Vec<bool>>: Boolean vector wheretrueindicates genes that pass all filters
Example
let gene_mask = mark_filter_genes::<u32, f64>(
&adata,
Some(3), // Expressed in at least 3 cells
None, // No maximum cells threshold
Some(10.0), // At least 10 total counts
None, // No maximum counts threshold
Some(0.001), // Expressed in at least 0.1% of cells
None // No maximum fraction threshold
)?;
Highly Variable Genes
compute_highly_variable_genes
Identifies highly variable genes using statistical methods.
pub fn compute_highly_variable_genes(
adata: &IMAnnData,
params: Option<HVGParams>
) -> anyhow::Result<()>
Parameters
adata: Reference to AnnData objectparams: OptionalHVGParamsstruct with the following fields:min_mean: Minimum mean expression (default: 0.0125)max_mean: Maximum mean expression (default: 3.0)min_dispersion: Minimum dispersion (default: 0.5)max_dispersion: Maximum dispersion (default: Infinity)n_bins: Number of bins for mean-variance relationship (default: 20)n_top_genes: Optional number of top variable genes to selectflavor: Statistical method (FlavorType::Seurat,FlavorType::CellRanger, orFlavorType::SVR)span: Span parameter for trend fitting (default: 0.3)batch_key: Optional column name for batch correction
Returns
anyhow::Result<()>: Success or error
Side Effects
Adds columns to adata.var():
means: Mean expression per genedispersions: Dispersion valuesdispersions_norm: Normalized dispersion valueshighly_variable: Boolean indicating highly variable genesdispersions_normalized_standardized: Standardized dispersion values
Example
// Default parameters
compute_highly_variable_genes(&adata, None)?;
// Custom parameters
let params = HVGParams {
min_mean: 0.01,
max_mean: 5.0,
min_dispersion: 0.5,
max_dispersion: f64::INFINITY,
n_bins: 20,
n_top_genes: Some(2000),
flavor: FlavorType::Seurat,
span: 0.3,
batch_key: None,
};
compute_highly_variable_genes(&adata, Some(params))?;
Differential Expression
rank_gene_groups
Performs differential expression analysis between groups of cells.
pub fn rank_gene_groups(
adata: &IMAnnData,
groupby: &str,
reference: Option<&str>,
groups: Option<&[&str]>,
key_added: Option<&str>,
method: Option<TestMethod>,
n_genes: Option<usize>,
correction_method: CorrectionMethod,
compute_logfoldchanges: Option<bool>,
pseudocount: Option<f64>
) -> anyhow::Result<()>
Parameters
adata: Reference to AnnData objectgroupby: Column name in obs containing group informationreference: Reference group name for comparison (None or "rest" uses all other cells as reference)groups: Groups to test (None tests all groups)key_added: Key for storing results (None or empty uses "rank_genes_groups")method: Statistical test method (default:TestMethod::TTest(TTestType::Welch))n_genes: Number of top genes to report (default: 100)correction_method: Multiple testing correction method:CorrectionMethod::BonferroniCorrectionMethod::BenjaminiHochbergCorrectionMethod::BenjaminiYekutieliCorrectionMethod::HolmBonferroniCorrectionMethod::HochbergCorrectionMethod::StoreyQValue
compute_logfoldchanges: Whether to compute log fold changes (default: true)pseudocount: Pseudocount for log fold change calculation (default: 1.0)
Returns
anyhow::Result<()>: Success or error
Side Effects
Adds results to adata.uns() under the specified key:
{key}_scores: Test statistics per group{key}_pvals: P-values per group{key}_pvals_adj: Adjusted p-values per group{key}_logfoldchanges: Log fold changes per group{key}_names: Gene names per group{key}_params_reference: Reference group information{key}_params_method: Method information{key}_params_groupby: Group column information{key}_groups: List of tested groups
Example
rank_gene_groups(
&adata,
"cell_type", // Column with group info
Some("control"), // Reference group
Some(&["type_a", "type_b"]), // Groups to test
Some("de_results"), // Key for storing results
Some(TestMethod::TTest(TTestType::Welch)),
Some(100), // Top genes to report
CorrectionMethod::BenjaminiHochberg,
Some(true), // Compute log fold changes
Some(1.0) // Pseudocount
)?;