Reputation: 181
I am attempting to get a column wise zscore using row mean and row standard deviation in R. I am new to the more complex functions like apply(), so I am not sure the best way to execute this without doing it manually in an embedded for loop. Exp is a expression matrix and will be very large so an embedded for loop will take some time. Please excuse the rough syntax. I need to retain the colnames.
Dat<-for (i in 1:length(nrow(Exp))) {
for (j in 1:length(ncol(Exp))) {
(Exp[,j]-rowMean(Ex[i,]))/rowSds[i,]
}
}
If I use apply() by rows, I don't retain the column names, and if I use apply() by the column, the wrong mean and standard deviations are used. I need to iterate over each cell, and a zscore calculation using the row mean and standard deviation for the row of that cell, but maintain the column names.
Any direction to resources or help would be appreciated. Thanks!
Upvotes: 1
Views: 565
Reputation:
I would use the scale()
function that does excatly what you want.
mtx <- matrix(rnorm(100), ncol= 2)
mtx_z <- apply(mtx, 2, scale)
Upvotes: 1
Reputation: 187
At first, do not use apply. You can creat your own colVars function using the colmeans. That's what I had done and now that function exists in c++. It is the Rfast::colVars(x).
Upvotes: 2
Reputation: 886938
We can use rowMeans
with rowSds
library(matrixStats)
(mtx - rowMeans(mtx))/rowSds(mtx)
set.seed(42)
mtx <- matrix(rnorm(200), ncol=10)
Upvotes: 1
Reputation: 160407
We can speed this up (and make it more readable).
Fake data:
set.seed(42)
mtx <- matrix(rnorm(200), ncol=10)
mtx[1:4,1:3]
# [,1] [,2] [,3]
# [1,] 1.3709584 -0.3066386 0.2059986
# [2,] -0.5646982 -1.7813084 -0.3610573
# [3,] 0.3631284 -0.1719174 0.7581632
# [4,] 0.6328626 1.2146747 -0.7267048
We can calculate the row-wise mean and standard deviation with:
rowSigma <- apply(mtx, 1, sd, na.rm = TRUE)
rowMu <- rowMeans(mtx, na.rm = TRUE)
(I'm inferring na.rm=TRUE
here ... though it might not be relevant for your data.)
From here, know that basic matrix-wise math (not linear algebra ops) is typically column-wise. To demonstrate/prove this,
m <- matrix(1:9, nrow = 3)
m
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 2 5 8
# [3,] 3 6 9
m + 1:3
# [,1] [,2] [,3]
# [1,] 2 5 8
# [2,] 4 7 10
# [3,] 6 9 12
With that confidence, we can now simply do
(mtx - rowMu) / rowSigma
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1.259 -0.55 0.0051 -0.6119 1.412 1.0760 -1.824 -0.31 -0.41 -0.054
# [2,] -0.049 -1.48 0.1907 0.8327 0.918 1.8429 -1.113 -0.43 -0.64 -0.071
# [3,] 0.543 -0.49 1.3088 0.9670 0.011 -2.1048 0.081 -1.02 0.16 0.554
# [4,] 0.336 0.95 -1.1044 1.1492 -0.462 1.6248 -1.391 -0.37 -0.72 -0.022
# [5,] 0.604 2.16 -1.2403 -0.5734 -1.059 -0.5104 0.181 -0.25 0.80 -0.107
# [6,] -0.426 -0.79 0.1849 1.1712 0.388 -0.1863 -0.792 0.96 1.32 -1.821
# [7,] 2.137 -0.17 -0.8968 0.6015 -0.121 -0.3886 -0.639 -0.47 -1.13 1.078
# [8,] 0.017 -1.49 1.4082 1.0414 -0.063 -0.0085 -1.729 -0.29 0.51 0.603
# [9,] 1.828 0.19 -0.7497 0.6731 0.686 -0.0977 -1.584 0.44 -0.21 -1.176
# [10,] -0.078 -0.76 0.7643 0.8408 0.959 0.1352 0.206 -1.24 1.05 -1.874
# [11,] 1.416 0.23 0.0434 -1.8632 1.538 -0.4413 0.387 -0.46 -0.73 -0.120
# [12,] 2.145 0.65 -0.7603 -0.1039 -0.469 0.0837 -0.485 -1.49 0.77 -0.345
# [13,] -1.427 0.79 1.2895 0.4169 0.442 -0.5993 -0.154 0.92 -1.75 0.077
# [14,] -0.358 -0.68 0.5283 -1.0064 1.248 -0.5745 0.990 -0.35 1.53 -1.334
# [15,] 0.068 0.74 0.3022 -0.3631 -0.960 -1.5391 1.722 -0.28 1.12 -0.801
# [16,] 0.993 -1.54 0.6062 0.9338 -0.618 -0.1029 -0.872 -1.02 0.15 1.477
# [17,] -0.055 -0.73 1.2414 1.3609 -1.195 -0.3619 0.170 0.32 -1.62 0.871
# [18,] -1.765 -0.56 0.0652 0.3143 -0.967 1.8055 0.806 -0.53 0.43 0.396
# [19,] -1.057 -1.04 -1.4293 -0.0093 0.642 -0.3303 0.271 0.23 0.91 1.811
# [20,] 1.498 -0.33 0.0227 -1.9499 0.547 -0.1876 -0.458 1.45 -0.39 -0.200
where each value is the z-score of the original data on a per-row basis.
(mtx[1,1] - rowMu[1]) / rowSigma[1]
# [1] 1.26
(mtx[2,3] - rowMu[2]) / rowSigma[2]
# [1] 0.191
Upvotes: 1