Title: | Utility Functions for Single-Cell RNA Sequencing Data |
---|---|
Description: | Analysis of single-cell RNA sequencing data can be simple and clear with the right utility functions. This package collects such functions, aiming to fulfill the following criteria: code clarity over performance (i.e. plain R code instead of C code), most important analysis steps over completeness (analysis 'by hand', not automated integration etc.), emphasis on quantitative visualization (intensity-coded color scale, etc.). |
Authors: | Felix Frauhammer [aut, cre], Simon Anders [ctb] (Simon Anders wrote the colVars_spm function.) |
Maintainer: | Felix Frauhammer <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0.9000 |
Built: | 2025-02-02 03:40:27 UTC |
Source: | https://github.com/felixthestudent/scutils |
Finds breaks that are powers of 2, and forces inclusion of upper and lower limits (displaying the closed interval). Including limits specifically is particularly useful for ggplot2's color/fill, as it emphasizes the meaning of maximal/minimal color intensities (see examples).
closed_breaks_log2(lims)
closed_breaks_log2(lims)
lims |
Vector with lower and upper limits (in that order) of the data that you want breaks for. |
The feat
function uses closed_breaks_log2
to color by
gene expression,
where the maximal expression gives valuable
intuition for a gene's overall expression strength.
For x- or y-axis (scale_*_log10
),
I still recommend breaks_log
from the scales package.
Numeric vector with breaks.
# closed breaks include maximum, breaks_log do not: closed_breaks_log2(lims = c(.01, 977.1)) scales::breaks_log()(c(.01, 977.1))
# closed breaks include maximum, breaks_log do not: closed_breaks_log2(lims = c(.01, 977.1)) scales::breaks_log()(c(.01, 977.1))
Complements the closed_breaks_log2 function.
closed_labels(x, min_is_zero = FALSE)
closed_labels(x, min_is_zero = FALSE)
x |
Vector of breaks for which to produce labels.
Typically, this is the output of |
min_is_zero |
Should the smallest break be displayed as zero (TRUE) or as the actual value (FALSE). Default: FALSE |
This is a helper for the feat
function.
feat
replaces numeric zeros with the next-smallest expression value
to avoid taking the logarithm of zero. min_is_zero
can be used to
display the lowest break of the color scale as zero in these cases.
Character vector with labels, used by feat
function.
label_scientific
label_number_auto
# human readable output: closed_labels(c(.001111,.122, 0.5, 10, 100, 1800))
# human readable output: closed_labels(c(.001111,.122, 0.5, 10, 100, 1800))
Compute variance for each column / each row of a dgCMatrix (from Matrix package).
colVars_spm(spm) rowVars_spm(spm)
colVars_spm(spm) rowVars_spm(spm)
spm |
A sparse matrix of class dgCMatrix from the Matrix package. |
The only supported format currently is dgCMatrix. While the Matrix package has other formats, this one is used for scRNAseq raw count data. Function code written by Simon Anders.
Vector with variances.
vignette("Intro2Matrix", package="Matrix")
CsparseMatrix-class
library(Matrix) mat <- as(matrix(rpois(900,1), ncol=3), "dgCMatrix") colVars_spm(mat)
library(Matrix) mat <- as(matrix(rpois(900,1), ncol=3), "dgCMatrix") colVars_spm(mat)
Highlight gene expression data in a 2D-embedding (UMAP, tSNE, etc.).
feat(embedding, expression, legend_name = "Expression")
feat(embedding, expression, legend_name = "Expression")
embedding |
A matrix/data.frame/tibble/... with exactly two columns.
If colnames are missing, the axis will be named "Dim1" and "Dim2".
Other classes than matrix/data.frame/tibble are possible, as long as
|
expression |
Numeric vector with expression values of the gene of
interest. Order has to correspond to the row order in |
legend_name |
Text displayed above the legend. Most commonly the name of the displayed gene. |
This function discourages customization on purpose, because it bundles geoms, themes and settings that I found important for visualizing gene expression in scRNAseq data:
coord_fixed, to avoid distortion of embeddings
geom_point with size=.4, to ameliorate overplotting
No background grid, because distances and axis units in embeddings do not carry meaning for most dimensionality reduction techniques.
Intensity-coded color scales (viridis) displayed with log2-transformation. Makes visualization independent of colorblindness and appropriate for gene expression data (which is usually Log Normal distributed).
Color scale breaks are displayed as 'closed interval', i.e.
max(expression)
and min(expression)
are the most extreme
breaks. Rounding makes them human-readable. This functionality is provided
by closed_breaks_log2 and closed_labels.
If you insist on customizing, think of this function as a great starting point, you can simply
copy-paste the code after typing feat
into your
console.
A ggplot2
object storing a colored scatter plot.
ggplot
,
closed_labels
,
closed_breaks_log2
# expression goes from 0 to 22: set.seed(100) feat(matrix(rnorm(2000, c(.1, 3)), ncol=2), rpois(1000, c(.1, 11))) # expression goes from 2 to 52: set.seed(100) feat(matrix(rnorm(2000, c(.1, 3)), ncol=2), rpois(1000, c(10, 31)))
# expression goes from 0 to 22: set.seed(100) feat(matrix(rnorm(2000, c(.1, 3)), ncol=2), rpois(1000, c(.1, 11))) # expression goes from 2 to 52: set.seed(100) feat(matrix(rnorm(2000, c(.1, 3)), ncol=2), rpois(1000, c(10, 31)))
Check if number(s) is/are integers. In contrast to is.integer, is_wholenumber does not check the class but accepts all numbers that are integers with reasonable precision.
is_wholenumber(x, tol = .Machine$double.eps^0.5)
is_wholenumber(x, tol = .Machine$double.eps^0.5)
x |
Number to test |
tol |
tolerance for testing |