Title: | Calculate Model-Based Metrics of Proportionality on Count-Based Compositional Data |
---|---|
Description: | Calculates metrics of proportionality using the logit-normal multinomial model. It can also provide empirical and plugin estimates of these metrics. |
Authors: | Kevin McGregor [aut, cre, cph], Nneka Okaeme [aut] |
Maintainer: | Kevin McGregor <[email protected]> |
License: | GPL (>=3) |
Version: | 1.0.1 |
Built: | 2025-03-11 03:45:31 UTC |
Source: | https://github.com/kevinmcgregor/countprop |
Calculates the Extended Bayesian Information Criterion (EBIC) of a model. Used for model selection to asses the fit of the multinomial logit-Normal model which includes a graphical lasso penalty.
ebic(l, n, d, df, gamma)
ebic(l, n, d, df, gamma)
l |
Log-likelihood estimates of the model |
n |
Number of rows of the data set for which the log-likelihood has been calculated |
d |
The size of the (k-1) by (k-1) covariance matrix of a k by k count-compositional data matrix |
df |
Degrees of freedom |
gamma |
A tuning parameter. Larger values means more penalization |
The value of the EBIC.
The graphical lasso penalty
is the sum of the absolute value of the elements of the covariance matrix Sigma
.
The penalization parameter lambda controls the sparsity of Sigma.
data(singlecell) mle <- mleLR(singlecell, lambda.gl=0.5) log.lik_1 <- mle$est[[1]]$log.lik n <- NROW(singlecell) k <- NCOL(singlecell) df_1 <- mle$est[[1]]$df ebic(log.lik_1, n, k, df_1, 0.1)
data(singlecell) mle <- mleLR(singlecell, lambda.gl=0.5) log.lik_1 <- mle$est[[1]]$log.lik n <- NROW(singlecell) k <- NCOL(singlecell) df_1 <- mle$est[[1]]$df ebic(log.lik_1, n, k, df_1, 0.1)
Plots the extended Bayesian information criterion (EBIC) of the model fit for
various penalization parameters lambda
.
ebicPlot(fit, xlog = TRUE, col = "darkred")
ebicPlot(fit, xlog = TRUE, col = "darkred")
fit |
The model fit object from |
xlog |
TRUE or FALSE. Renders plot with the x-axis in the log-scale if |
col |
Colour of the plot (character) |
Plot of the EBIC (y-axis) against each lambda (x-axis).
data(singlecell) mle <- mlePath(singlecell, tol=1e-4, tol.nr=1e-4, n.lambda = 2, n.cores = 1) ebicPlot(mle, xlog = TRUE)
data(singlecell) mle <- mlePath(singlecell, tol=1e-4, tol.nr=1e-4, n.lambda = 2, n.cores = 1) ebicPlot(mle, xlog = TRUE)
Estimates the variation matrix of count-compositional data based on a multinomial logit-Normal distribution. Estimation is performed using only the parameters of the distribution.
logitNormalVariation( mu, Sigma, type = c("standard", "phi", "phis", "rho"), order = c("second", "first") )
logitNormalVariation( mu, Sigma, type = c("standard", "phi", "phis", "rho"), order = c("second", "first") )
mu |
The mle estimate of the mu matrix |
Sigma |
The mle estimate of the Sigma matrix |
type |
Type of variation metric to be calculated: |
order |
The order of the Taylor-series approximation to be used in the estimation |
An estimate of the requested metric of proportionality.
data(singlecell) mle <- mleLR(singlecell) mu.hat <- mle$mu Sigma.hat <- mle$Sigma logitNormalVariation(mu.hat, Sigma.hat) logitNormalVariation(mu.hat, Sigma.hat, type="phi") logitNormalVariation(mu.hat, Sigma.hat, type="rho")
data(singlecell) mle <- mleLR(singlecell) mu.hat <- mle$mu Sigma.hat <- mle$Sigma logitNormalVariation(mu.hat, Sigma.hat) logitNormalVariation(mu.hat, Sigma.hat, type="phi") logitNormalVariation(mu.hat, Sigma.hat, type="rho")
Calculates the log-likelihood, under the multinomial logit-Normal model.
logLik(v, y, ni, S, invSigma)
logLik(v, y, ni, S, invSigma)
v |
The additive log-ratio transform of y |
y |
Compositional dataset |
ni |
The row sums of y |
S |
Covariance of |
invSigma |
The inverse of the Sigma matrix |
The estimated log-likelihood under the Multinomial logit-Normal distribution.
data(singlecell) mle.sim <- mlePath(singlecell, tol=1e-4, tol.nr=1e-4, n.lambda = 2, n.cores = 1) n <- NROW(singlecell) logLik(mle.sim$est.min$v, singlecell, n, cov(mle.sim$est.min$v), mle.sim$est.min$Sigma.inv)
data(singlecell) mle.sim <- mlePath(singlecell, tol=1e-4, tol.nr=1e-4, n.lambda = 2, n.cores = 1) n <- NROW(singlecell) logLik(mle.sim$est.min$v, singlecell, n, cov(mle.sim$est.min$v), mle.sim$est.min$Sigma.inv)
Estimates the variance-covariance of the log of the proportions using a Taylor-series approximation.
logVarTaylorFull( mu, Sigma, transf = c("alr", "clr"), order = c("second", "first") )
logVarTaylorFull( mu, Sigma, transf = c("alr", "clr"), order = c("second", "first") )
mu |
The mean vector of the log-ratio-transformed data (ALR or CLR) |
Sigma |
The variance-covariance matrix of the log-ratio-transformed data (ALR or CLR) |
transf |
The desired transformation. If |
order |
The desired order of the Taylor Series approximation |
The estimated variance-covariance matrix for log p
.
data(singlecell) mle <- mleLR(singlecell) mu <- mle$mu Sigma <- mle$Sigma logVarTaylorFull(mu, Sigma)
data(singlecell) mle <- mleLR(singlecell) mu <- mle$mu Sigma <- mle$Sigma logVarTaylorFull(mu, Sigma)
Returns the maximum likelihood estimates of multinomial logit-normal model parameters given a count-compositional dataset. The MLE procedure is based on the multinomial logit-Normal distribution, using the EM algorithm from Hoff (2003).
mleLR( y, max.iter = 10000, max.iter.nr = 100, tol = 1e-06, tol.nr = 1e-06, lambda.gl = 0, gamma = 0.1, verbose = FALSE )
mleLR( y, max.iter = 10000, max.iter.nr = 100, tol = 1e-06, tol.nr = 1e-06, lambda.gl = 0, gamma = 0.1, verbose = FALSE )
y |
Matrix of counts; samples are rows and features are columns. |
max.iter |
Maximum number of iterations |
max.iter.nr |
Maximum number of Newton-Raphson iterations |
tol |
Stopping rule |
tol.nr |
Stopping rule for the Newton-Raphson algorithm |
lambda.gl |
Penalization parameter lambda, for the graphical lasso penalty. Controls the sparsity of Sigma |
gamma |
Gamma value for EBIC calculation of the log-likelihood |
verbose |
If TRUE, print information as the functions run |
The additive log-ratio of y (v
); maximum likelihood estimates of
mu
, Sigma
, and Sigma.inv
;
the log-likelihood (log.lik
); the EBIC (extended Bayesian information criterion)
of the log-likelihood of the multinomial logit-Normal model with the
graphical lasso penalty (ebic
); degrees of freedom of the Sigma.inv
matrix (df
).
The graphical lasso penalty
is the sum of the absolute value of the elements of the covariance matrix Sigma
.
The penalization parameter lambda controls the sparsity of Sigma.
This function is also used within the mlePath()
function.
data(singlecell) mle <- mleLR(singlecell) mle$mu mle$Sigma mle$ebic
data(singlecell) mle <- mleLR(singlecell) mle$mu mle$Sigma mle$ebic
Calculates the maximum likelihood estimates of the parameters for the
mutlinomial logit-Normal distribution under various values
of the penalization parameter lambda
. Parameter lambda
controls
the sparsity of the covariance matrix Sigma
, and penalizes the false
large correlations that may arise in high-dimensional data.
mlePath( y, max.iter = 10000, max.iter.nr = 100, tol = 1e-06, tol.nr = 1e-06, lambda.gl = NULL, lambda.min.ratio = 0.1, n.lambda = 1, n.cores = 1, gamma = 0.1 )
mlePath( y, max.iter = 10000, max.iter.nr = 100, tol = 1e-06, tol.nr = 1e-06, lambda.gl = NULL, lambda.min.ratio = 0.1, n.lambda = 1, n.cores = 1, gamma = 0.1 )
y |
Matrix of counts; samples are rows and features are columns. |
max.iter |
Maximum number of iterations |
max.iter.nr |
Maximum number of Newton-Raphson iterations |
tol |
Stopping rule |
tol.nr |
Stopping rule for the Newton Raphson algorithm |
lambda.gl |
Vector of penalization parameters lambda, for the graphical lasso penalty |
lambda.min.ratio |
Minimum lambda ratio of the maximum lambda, used for the sequence of lambdas |
n.lambda |
Number of lambdas to evaluate the model on |
n.cores |
Number of cores to use (for parallel computation) |
gamma |
Gamma value for EBIC calculation of the log-likelihood |
The MLE estimates of y
for each element lambda of lambda.gl, (est
);
the value of the estimates which produce the minimum EBIC, (est.min
);
the vector of lambdas used for graphical lasso, (lambda.gl
); the index of
the minimum EBIC (extended Bayesian information criterion), (min.idx
);
vector containing the EBIC for each lambda, (ebic
).
If using parallel computing, consider setting n.cores
to be equal
to the number of lambdas being evaluated for, n.lambda
.
The graphical lasso penalty
is the sum of the absolute value of the elements of the covariance matrix Sigma
.
The penalization parameter lambda controls the sparsity of Sigma.
data(singlecell) mle.sim <- mlePath(singlecell, tol=1e-4, tol.nr=1e-4, n.lambda = 2, n.cores = 1) mu.hat <- mle.sim$est.min$mu Sigma.hat <- mle.sim$est.min$Sigma
data(singlecell) mle.sim <- mlePath(singlecell, tol=1e-4, tol.nr=1e-4, n.lambda = 2, n.cores = 1) mu.hat <- mle.sim$est.min$mu Sigma.hat <- mle.sim$est.min$Sigma
Naive (empirical) estimates of proportionality metrics using only the observed counts.
naiveVariation( counts, pseudo.count = 0, type = c("standard", "phi", "phis", "rho", "logp"), impute.zeros = TRUE, ... )
naiveVariation( counts, pseudo.count = 0, type = c("standard", "phi", "phis", "rho", "logp"), impute.zeros = TRUE, ... )
counts |
Matrix of counts; samples are rows and features are columns |
pseudo.count |
Positive count to be added to all elements of count matrix. |
type |
Type of variation metric to be calculated: |
impute.zeros |
If TRUE, then |
... |
Optional arguments passed to zero-imputation function |
An estimate of the requested metric of proportionality.
#' data(singlecell) naiveVariation(singlecell) naiveVariation(singlecell, type="phi") naiveVariation(singlecell, type="rho")
#' data(singlecell) naiveVariation(singlecell) naiveVariation(singlecell, type="phi") naiveVariation(singlecell, type="rho")
Estimates the variation matrix of count-compositional data
based on a the same approximation used in logitNormalVariation()
only for this function it uses empirical estimates of mu and Sigma.
Also performs zero-imputation using cmultRepl()
from the zCompositions
package.
pluginVariation( counts, type = c("standard", "phi", "phis", "rho"), order = c("second", "first"), impute.zeros = TRUE, ... )
pluginVariation( counts, type = c("standard", "phi", "phis", "rho"), order = c("second", "first"), impute.zeros = TRUE, ... )
counts |
Matrix of counts; samples are rows and features are columns. |
type |
Type of variation metric to be calculated: |
order |
The order of the Taylor-series approximation to be used in the estimation |
impute.zeros |
If TRUE, then |
... |
Optional arguments passed to zero-imputation function |
An estimate of the requested metric of proportionality.
data(singlecell) pluginVariation(singlecell) pluginVariation(singlecell, type="phi") pluginVariation(singlecell, type="rho")
data(singlecell) pluginVariation(singlecell) pluginVariation(singlecell, type="phi") pluginVariation(singlecell, type="rho")
A subset of single cell data from Buettner et al. 2015. Contains single cell measurements from 96 mouse embryonic stem cells all in G1 phase.
data(singlecell)
data(singlecell)
## 'singlecell' A matrix with 96 rows and 10 columns.
<https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-2805>
data(singlecell)
data(singlecell)