Reputation: 21
I have on matrix in R with 440 rows and 261 columns. There are some 0 values. In each row I need to change the 0 values to the mean of all the values. I tried to do it with the code below, but every time it changed with only the first mean value.
snp2<- read.table("snp2.txt",h=T)
mean <- rowMeans(snp2)
for(k in 1:nrow(snp2))
{
snp2[k==0]<-mean[k]
}
Upvotes: 2
Views: 265
Reputation: 44309
Instead of looping through the rows, you could do this in one shot by identifying all the 0 indices in the matrix and replacing them with the appropriate row mean:
# Sample data
(mat <- matrix(c(0, 1, 2, 1, 0, 3, 11, 11, 11), nrow=3))
# [,1] [,2] [,3]
# [1,] 0 1 11
# [2,] 1 0 11
# [3,] 2 3 11
(zeroes <- which(mat == 0, arr.ind=TRUE))
# row col
# [1,] 1 1
# [2,] 2 2
mat[zeroes] <- rowMeans(mat)[zeroes[,"row"]]
mat
# [,1] [,2] [,3]
# [1,] 4 1 11
# [2,] 1 4 11
# [3,] 2 3 11
While you could fix up your function to replace this missing values row-by-row, this will not be as efficient as the one-shot approach (in addition to being more typing):
josilber <- function(mat) {
zeroes <- which(mat == 0, arr.ind=TRUE)
mat[zeroes] <- rowMeans(mat)[zeroes[,"row"]]
mat
}
OP.fixed <- function(mat) {
means <- rowMeans(mat)
for(k in 1:nrow(mat)) {
mat[k,][mat[k,] == 0] <- means[k]
}
mat
}
bgoldst <- function(m) ifelse(m==0,rowMeans({ mt <- m; mt[mt==0] <- NA; mt; },na.rm=T)[row(m)],m);
# 4400 x 2610 matrix
bigger <- matrix(sample(0:10, 4400*2610, replace=TRUE), nrow=4400)
all.equal(josilber(bigger), OP.fixed(bigger))
# [1] TRUE
# bgoldst differs because it takes means of non-zero values only
library(microbenchmark)
microbenchmark(josilber(bigger), OP.fixed(bigger), bgoldst(bigger), times=10)
# Unit: milliseconds
# expr min lq mean median uq max neval
# josilber(bigger) 262.541 382.0706 406.1107 395.3815 452.0872 532.4742 10
# OP.fixed(bigger) 1033.071 1184.7288 1236.6245 1238.8298 1271.7677 1606.6737 10
# bgoldst(bigger) 3820.044 4033.5826 4368.5848 4201.6302 4611.9697 5581.5514 10
For a fairly large matrix (4400 x 2610), the one-shot procedure is about 3 times quicker than the fixed up solution from the question and about 10 times faster than the one proposed by @bgoldst.
Upvotes: 1
Reputation: 35314
Here's a solution using ifelse()
, assuming you want to exclude zeroes from the mean calculation:
NR <- 5; NC <- 5;
set.seed(1); m <- matrix(sample(c(rep(0,5),1:5),NR*NC,replace=T),NR);
m;
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 4 0 0 5
## [2,] 0 5 0 3 0
## [3,] 1 2 2 5 2
## [4,] 5 2 0 0 0
## [5,] 0 0 3 3 0
ifelse(m==0,rowMeans({ mt <- m; mt[mt==0] <- NA; mt; },na.rm=T)[row(m)],m);
## [,1] [,2] [,3] [,4] [,5]
## [1,] 4.5 4 4.5 4.5 5.0
## [2,] 4.0 5 4.0 3.0 4.0
## [3,] 1.0 2 2.0 5.0 2.0
## [4,] 5.0 2 3.5 3.5 3.5
## [5,] 3.0 3 3.0 3.0 3.0
Upvotes: 0