Manninm
Manninm

Reputation: 181

Column-wise zscore using row mean and row standard deviation in R

I am attempting to get a column wise zscore using row mean and row standard deviation in R. I am new to the more complex functions like apply(), so I am not sure the best way to execute this without doing it manually in an embedded for loop. Exp is a expression matrix and will be very large so an embedded for loop will take some time. Please excuse the rough syntax. I need to retain the colnames.

Dat<-for (i in 1:length(nrow(Exp))) {
for (j in 1:length(ncol(Exp))) {
(Exp[,j]-rowMean(Ex[i,]))/rowSds[i,]
}
}

If I use apply() by rows, I don't retain the column names, and if I use apply() by the column, the wrong mean and standard deviations are used. I need to iterate over each cell, and a zscore calculation using the row mean and standard deviation for the row of that cell, but maintain the column names.

Any direction to resources or help would be appreciated. Thanks!

Upvotes: 1

Views: 565

Answers (4)

user11538509
user11538509

Reputation:

I would use the scale() function that does excatly what you want.

mtx <- matrix(rnorm(100), ncol= 2)
mtx_z <- apply(mtx, 2, scale)

Upvotes: 1

Michail
Michail

Reputation: 187

At first, do not use apply. You can creat your own colVars function using the colmeans. That's what I had done and now that function exists in c++. It is the Rfast::colVars(x).

Upvotes: 2

akrun
akrun

Reputation: 886938

We can use rowMeans with rowSds

library(matrixStats)
(mtx - rowMeans(mtx))/rowSds(mtx)

data

set.seed(42)
mtx <- matrix(rnorm(200), ncol=10)

Upvotes: 1

r2evans
r2evans

Reputation: 160407

We can speed this up (and make it more readable).

Fake data:

set.seed(42)
mtx <- matrix(rnorm(200), ncol=10)
mtx[1:4,1:3]
#            [,1]       [,2]       [,3]
# [1,]  1.3709584 -0.3066386  0.2059986
# [2,] -0.5646982 -1.7813084 -0.3610573
# [3,]  0.3631284 -0.1719174  0.7581632
# [4,]  0.6328626  1.2146747 -0.7267048

We can calculate the row-wise mean and standard deviation with:

rowSigma <- apply(mtx, 1, sd, na.rm = TRUE)
rowMu <- rowMeans(mtx, na.rm = TRUE)

(I'm inferring na.rm=TRUE here ... though it might not be relevant for your data.)

From here, know that basic matrix-wise math (not linear algebra ops) is typically column-wise. To demonstrate/prove this,

m <- matrix(1:9, nrow = 3)
m
#      [,1] [,2] [,3]
# [1,]    1    4    7
# [2,]    2    5    8
# [3,]    3    6    9
m + 1:3
#      [,1] [,2] [,3]
# [1,]    2    5    8
# [2,]    4    7   10
# [3,]    6    9   12

With that confidence, we can now simply do

(mtx - rowMu) / rowSigma
#         [,1]  [,2]    [,3]    [,4]   [,5]    [,6]   [,7]  [,8]  [,9]  [,10]
#  [1,]  1.259 -0.55  0.0051 -0.6119  1.412  1.0760 -1.824 -0.31 -0.41 -0.054
#  [2,] -0.049 -1.48  0.1907  0.8327  0.918  1.8429 -1.113 -0.43 -0.64 -0.071
#  [3,]  0.543 -0.49  1.3088  0.9670  0.011 -2.1048  0.081 -1.02  0.16  0.554
#  [4,]  0.336  0.95 -1.1044  1.1492 -0.462  1.6248 -1.391 -0.37 -0.72 -0.022
#  [5,]  0.604  2.16 -1.2403 -0.5734 -1.059 -0.5104  0.181 -0.25  0.80 -0.107
#  [6,] -0.426 -0.79  0.1849  1.1712  0.388 -0.1863 -0.792  0.96  1.32 -1.821
#  [7,]  2.137 -0.17 -0.8968  0.6015 -0.121 -0.3886 -0.639 -0.47 -1.13  1.078
#  [8,]  0.017 -1.49  1.4082  1.0414 -0.063 -0.0085 -1.729 -0.29  0.51  0.603
#  [9,]  1.828  0.19 -0.7497  0.6731  0.686 -0.0977 -1.584  0.44 -0.21 -1.176
# [10,] -0.078 -0.76  0.7643  0.8408  0.959  0.1352  0.206 -1.24  1.05 -1.874
# [11,]  1.416  0.23  0.0434 -1.8632  1.538 -0.4413  0.387 -0.46 -0.73 -0.120
# [12,]  2.145  0.65 -0.7603 -0.1039 -0.469  0.0837 -0.485 -1.49  0.77 -0.345
# [13,] -1.427  0.79  1.2895  0.4169  0.442 -0.5993 -0.154  0.92 -1.75  0.077
# [14,] -0.358 -0.68  0.5283 -1.0064  1.248 -0.5745  0.990 -0.35  1.53 -1.334
# [15,]  0.068  0.74  0.3022 -0.3631 -0.960 -1.5391  1.722 -0.28  1.12 -0.801
# [16,]  0.993 -1.54  0.6062  0.9338 -0.618 -0.1029 -0.872 -1.02  0.15  1.477
# [17,] -0.055 -0.73  1.2414  1.3609 -1.195 -0.3619  0.170  0.32 -1.62  0.871
# [18,] -1.765 -0.56  0.0652  0.3143 -0.967  1.8055  0.806 -0.53  0.43  0.396
# [19,] -1.057 -1.04 -1.4293 -0.0093  0.642 -0.3303  0.271  0.23  0.91  1.811
# [20,]  1.498 -0.33  0.0227 -1.9499  0.547 -0.1876 -0.458  1.45 -0.39 -0.200

where each value is the z-score of the original data on a per-row basis.

(mtx[1,1] - rowMu[1]) / rowSigma[1]
# [1] 1.26
(mtx[2,3] - rowMu[2]) / rowSigma[2]
# [1] 0.191

Upvotes: 1

Related Questions