Package 'countprop' reference manual

Title:	Calculate Model-Based Metrics of Proportionality on Count-Based Compositional Data
Description:	Calculates metrics of proportionality using the logit-normal multinomial model. It can also provide empirical and plugin estimates of these metrics.
Authors:	Kevin McGregor [aut, cre, cph], Nneka Okaeme [aut]
Maintainer:	Kevin McGregor <[email protected]>
License:	GPL (>=3)
Version:	1.0.1
Built:	2025-03-11 03:45:31 UTC
Source:	https://github.com/kevinmcgregor/countprop

Extended Bayesian Information Criterion

Description

Calculates the Extended Bayesian Information Criterion (EBIC) of a model. Used for model selection to asses the fit of the multinomial logit-Normal model which includes a graphical lasso penalty.

Usage

ebic(l, n, d, df, gamma)
ebic(l, n, d, df, gamma)

Arguments

`l`	Log-likelihood estimates of the model
`n`	Number of rows of the data set for which the log-likelihood has been calculated
`d`	The size of the (k-1) by (k-1) covariance matrix of a k by k count-compositional data matrix
`df`	Degrees of freedom
`gamma`	A tuning parameter. Larger values means more penalization

Value

The value of the EBIC.

Note

The graphical lasso penalty is the sum of the absolute value of the elements of the covariance matrix Sigma. The penalization parameter lambda controls the sparsity of Sigma.

Examples

data(singlecell)
mle <- mleLR(singlecell, lambda.gl=0.5)
log.lik_1 <- mle$est[[1]]$log.lik
n <- NROW(singlecell)
k <- NCOL(singlecell)
df_1 <- mle$est[[1]]$df

ebic(log.lik_1, n, k, df_1, 0.1)

data(singlecell)
mle <- mleLR(singlecell, lambda.gl=0.5)
log.lik_1 <- mle$est[[1]]$log.lik
n <- NROW(singlecell)
k <- NCOL(singlecell)
df_1 <- mle$est[[1]]$df

ebic(log.lik_1, n, k, df_1, 0.1)

Extended Bayesian Information Criterion Plot

Description

Plots the extended Bayesian information criterion (EBIC) of the model fit for various penalization parameters lambda.

Usage

ebicPlot(fit, xlog = TRUE, col = "darkred")
ebicPlot(fit, xlog = TRUE, col = "darkred")

Arguments

`fit`	The model fit object from `mleLR()`
`xlog`	TRUE or FALSE. Renders plot with the x-axis in the log-scale if `TRUE`
`col`	Colour of the plot (character)

Value

Plot of the EBIC (y-axis) against each lambda (x-axis).

Examples

data(singlecell)
mle <- mlePath(singlecell, tol=1e-4, tol.nr=1e-4, n.lambda = 2, n.cores = 1)

ebicPlot(mle, xlog = TRUE)

data(singlecell)
mle <- mlePath(singlecell, tol=1e-4, tol.nr=1e-4, n.lambda = 2, n.cores = 1)

ebicPlot(mle, xlog = TRUE)

Logit Normal Variation

Description

Estimates the variation matrix of count-compositional data based on a multinomial logit-Normal distribution. Estimation is performed using only the parameters of the distribution.

Usage

logitNormalVariation(
  mu,
  Sigma,
  type = c("standard", "phi", "phis", "rho"),
  order = c("second", "first")
)
logitNormalVariation(
  mu,
  Sigma,
  type = c("standard", "phi", "phis", "rho"),
  order = c("second", "first")
)

Arguments

`mu`	The mle estimate of the mu matrix
`Sigma`	The mle estimate of the Sigma matrix
`type`	Type of variation metric to be calculated: `standard`, `phi`, `phis` (a symmetrical version of `phi`), `rho`, or `logp` (the variance-covariance matrix of log-transformed proportions)
`order`	The order of the Taylor-series approximation to be used in the estimation

Value

An estimate of the requested metric of proportionality.

Examples

data(singlecell)
mle <- mleLR(singlecell)
mu.hat <- mle$mu
Sigma.hat <- mle$Sigma

logitNormalVariation(mu.hat, Sigma.hat)
logitNormalVariation(mu.hat, Sigma.hat, type="phi")
logitNormalVariation(mu.hat, Sigma.hat, type="rho")

data(singlecell)
mle <- mleLR(singlecell)
mu.hat <- mle$mu
Sigma.hat <- mle$Sigma

logitNormalVariation(mu.hat, Sigma.hat)
logitNormalVariation(mu.hat, Sigma.hat, type="phi")
logitNormalVariation(mu.hat, Sigma.hat, type="rho")

Log-Likelihood

Description

Calculates the log-likelihood, under the multinomial logit-Normal model.

Usage

logLik(v, y, ni, S, invSigma)
logLik(v, y, ni, S, invSigma)

Arguments

`v`	The additive log-ratio transform of y
`y`	Compositional dataset
`ni`	The row sums of y
`S`	Covariance of `v`
`invSigma`	The inverse of the Sigma matrix

Value

The estimated log-likelihood under the Multinomial logit-Normal distribution.

Examples

data(singlecell)
mle.sim <- mlePath(singlecell, tol=1e-4, tol.nr=1e-4, n.lambda = 2, n.cores = 1)

n <- NROW(singlecell)


logLik(mle.sim$est.min$v,
      singlecell,
      n,
      cov(mle.sim$est.min$v),
      mle.sim$est.min$Sigma.inv)

data(singlecell)
mle.sim <- mlePath(singlecell, tol=1e-4, tol.nr=1e-4, n.lambda = 2, n.cores = 1)

n <- NROW(singlecell)


logLik(mle.sim$est.min$v,
      singlecell,
      n,
      cov(mle.sim$est.min$v),
      mle.sim$est.min$Sigma.inv)

Full logp Variance-Covariance

Description

Estimates the variance-covariance of the log of the proportions using a Taylor-series approximation.

Usage

logVarTaylorFull(
  mu,
  Sigma,
  transf = c("alr", "clr"),
  order = c("second", "first")
)
logVarTaylorFull(
  mu,
  Sigma,
  transf = c("alr", "clr"),
  order = c("second", "first")
)

Arguments

`mu`	The mean vector of the log-ratio-transformed data (ALR or CLR)
`Sigma`	The variance-covariance matrix of the log-ratio-transformed data (ALR or CLR)
`transf`	The desired transformation. If `transf="alr"` the inverse additive log-ratio transformation is applied. If `transf="clr"` the inverse centered log-ratio transformation is applied.
`order`	The desired order of the Taylor Series approximation

Value

The estimated variance-covariance matrix for log p.

Examples

data(singlecell)
mle <- mleLR(singlecell)
mu <- mle$mu
Sigma <- mle$Sigma

logVarTaylorFull(mu, Sigma)

data(singlecell)
mle <- mleLR(singlecell)
mu <- mle$mu
Sigma <- mle$Sigma

logVarTaylorFull(mu, Sigma)

Maximum Likelihood Estimate for multinomial logit-normal model

Description

Returns the maximum likelihood estimates of multinomial logit-normal model parameters given a count-compositional dataset. The MLE procedure is based on the multinomial logit-Normal distribution, using the EM algorithm from Hoff (2003).

Usage

mleLR(
  y,
  max.iter = 10000,
  max.iter.nr = 100,
  tol = 1e-06,
  tol.nr = 1e-06,
  lambda.gl = 0,
  gamma = 0.1,
  verbose = FALSE
)
mleLR(
  y,
  max.iter = 10000,
  max.iter.nr = 100,
  tol = 1e-06,
  tol.nr = 1e-06,
  lambda.gl = 0,
  gamma = 0.1,
  verbose = FALSE
)

Arguments

`y`	Matrix of counts; samples are rows and features are columns.
`max.iter`	Maximum number of iterations
`max.iter.nr`	Maximum number of Newton-Raphson iterations
`tol`	Stopping rule
`tol.nr`	Stopping rule for the Newton-Raphson algorithm
`lambda.gl`	Penalization parameter lambda, for the graphical lasso penalty. Controls the sparsity of Sigma
`gamma`	Gamma value for EBIC calculation of the log-likelihood
`verbose`	If TRUE, print information as the functions run

Value

The additive log-ratio of y (v); maximum likelihood estimates of mu, Sigma, and Sigma.inv; the log-likelihood (log.lik); the EBIC (extended Bayesian information criterion) of the log-likelihood of the multinomial logit-Normal model with the graphical lasso penalty (ebic); degrees of freedom of the Sigma.inv matrix (df).

Note

The graphical lasso penalty is the sum of the absolute value of the elements of the covariance matrix Sigma. The penalization parameter lambda controls the sparsity of Sigma.

This function is also used within the mlePath() function.

Examples

data(singlecell)
mle <- mleLR(singlecell)

mle$mu
mle$Sigma
mle$ebic

data(singlecell)
mle <- mleLR(singlecell)

mle$mu
mle$Sigma
mle$ebic

Maximum Likelihood Estimator Paths

Description

Calculates the maximum likelihood estimates of the parameters for the mutlinomial logit-Normal distribution under various values of the penalization parameter lambda. Parameter lambda controls the sparsity of the covariance matrix Sigma, and penalizes the false large correlations that may arise in high-dimensional data.

Usage

mlePath(
  y,
  max.iter = 10000,
  max.iter.nr = 100,
  tol = 1e-06,
  tol.nr = 1e-06,
  lambda.gl = NULL,
  lambda.min.ratio = 0.1,
  n.lambda = 1,
  n.cores = 1,
  gamma = 0.1
)
mlePath(
  y,
  max.iter = 10000,
  max.iter.nr = 100,
  tol = 1e-06,
  tol.nr = 1e-06,
  lambda.gl = NULL,
  lambda.min.ratio = 0.1,
  n.lambda = 1,
  n.cores = 1,
  gamma = 0.1
)

Arguments

`y`	Matrix of counts; samples are rows and features are columns.
`max.iter`	Maximum number of iterations
`max.iter.nr`	Maximum number of Newton-Raphson iterations
`tol`	Stopping rule
`tol.nr`	Stopping rule for the Newton Raphson algorithm
`lambda.gl`	Vector of penalization parameters lambda, for the graphical lasso penalty
`lambda.min.ratio`	Minimum lambda ratio of the maximum lambda, used for the sequence of lambdas
`n.lambda`	Number of lambdas to evaluate the model on
`n.cores`	Number of cores to use (for parallel computation)
`gamma`	Gamma value for EBIC calculation of the log-likelihood

Value

The MLE estimates of y for each element lambda of lambda.gl, (est); the value of the estimates which produce the minimum EBIC, (est.min); the vector of lambdas used for graphical lasso, (lambda.gl); the index of the minimum EBIC (extended Bayesian information criterion), (min.idx); vector containing the EBIC for each lambda, (ebic).

Note

If using parallel computing, consider setting n.cores to be equal to the number of lambdas being evaluated for, n.lambda.

The graphical lasso penalty is the sum of the absolute value of the elements of the covariance matrix Sigma. The penalization parameter lambda controls the sparsity of Sigma.

Examples

data(singlecell)
mle.sim <- mlePath(singlecell, tol=1e-4, tol.nr=1e-4, n.lambda = 2, n.cores = 1)

mu.hat <- mle.sim$est.min$mu
Sigma.hat <- mle.sim$est.min$Sigma

data(singlecell)
mle.sim <- mlePath(singlecell, tol=1e-4, tol.nr=1e-4, n.lambda = 2, n.cores = 1)

mu.hat <- mle.sim$est.min$mu
Sigma.hat <- mle.sim$est.min$Sigma

Naive (Empirical) Variation

Description

Naive (empirical) estimates of proportionality metrics using only the observed counts.

Usage

naiveVariation(
  counts,
  pseudo.count = 0,
  type = c("standard", "phi", "phis", "rho", "logp"),
  impute.zeros = TRUE,
  ...
)
naiveVariation(
  counts,
  pseudo.count = 0,
  type = c("standard", "phi", "phis", "rho", "logp"),
  impute.zeros = TRUE,
  ...
)

Arguments

`counts`	Matrix of counts; samples are rows and features are columns
`pseudo.count`	Positive count to be added to all elements of count matrix.
`type`	Type of variation metric to be calculated: `standard`, `phi`, `phis` (a symmetric version of `phi`), `rho`, or `logp` (the variance-covariance matrix of log-transformed proportions)
`impute.zeros`	If TRUE, then `cmultRepl()` from the `zCompositions` package is used to impute zero values in the counts matrix.
`...`	Optional arguments passed to zero-imputation function `cmultRepl()`

Value

An estimate of the requested metric of proportionality.

Examples

#' data(singlecell)

naiveVariation(singlecell)
naiveVariation(singlecell, type="phi")
naiveVariation(singlecell, type="rho")

#' data(singlecell)

naiveVariation(singlecell)
naiveVariation(singlecell, type="phi")
naiveVariation(singlecell, type="rho")

Plugin Variation

Description

Estimates the variation matrix of count-compositional data based on a the same approximation used in logitNormalVariation() only for this function it uses empirical estimates of mu and Sigma. Also performs zero-imputation using cmultRepl() from the zCompositions package.

Usage

pluginVariation(
  counts,
  type = c("standard", "phi", "phis", "rho"),
  order = c("second", "first"),
  impute.zeros = TRUE,
  ...
)
pluginVariation(
  counts,
  type = c("standard", "phi", "phis", "rho"),
  order = c("second", "first"),
  impute.zeros = TRUE,
  ...
)

Arguments

`counts`	Matrix of counts; samples are rows and features are columns.
`type`	Type of variation metric to be calculated: `standard`, `phi`, `phis` (a symmetrical version of `phi`), `rho`, or `logp` (the variance-covariance matrix of log-transformed proportions).
`order`	The order of the Taylor-series approximation to be used in the estimation
`impute.zeros`	If TRUE, then `cmultRepl()` from the `zCompositions` package is used to impute zero values in the counts matrix.
`...`	Optional arguments passed to zero-imputation function `cmultRepl()`

Value

An estimate of the requested metric of proportionality.

Examples

data(singlecell)

pluginVariation(singlecell)
pluginVariation(singlecell, type="phi")
pluginVariation(singlecell, type="rho")

data(singlecell)

pluginVariation(singlecell)
pluginVariation(singlecell, type="phi")
pluginVariation(singlecell, type="rho")

Single cell sequencing data from mouse embryonic stem cells in G1 phase

Description

A subset of single cell data from Buettner et al. 2015. Contains single cell measurements from 96 mouse embryonic stem cells all in G1 phase.

Usage

data(singlecell)
data(singlecell)

Format

## 'singlecell' A matrix with 96 rows and 10 columns.

Source

<https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-2805>

Examples

data(singlecell)

data(singlecell)

Package 'countprop'

Help Index

Extended Bayesian Information Criterion

Description

Usage

Arguments

Value

Note

Examples

Extended Bayesian Information Criterion Plot

Description

Usage

Arguments

Value

Examples

Logit Normal Variation

Description

Usage

Arguments

Value

Examples

Log-Likelihood

Description

Usage

Arguments

Value

Examples

Full logp Variance-Covariance

Description

Usage

Arguments

Value

Examples

Maximum Likelihood Estimate for multinomial logit-normal model

Description

Usage

Arguments

Value

Note

Examples

Maximum Likelihood Estimator Paths

Description

Usage

Arguments

Value

Note

Examples

Naive (Empirical) Variation

Description

Usage

Arguments

Value

Examples

Plugin Variation

Description

Usage

Arguments

Value

Examples

Single cell sequencing data from mouse embryonic stem cells in G1 phase

Description

Usage

Format

Source

Examples