Reputation: 3805
I have a dataframe with 640 rows and 50 columns.
Columns 1 - 4 have some ID and character variables while 5 - 50 columns have the actual data. For each row, missing values in column 5:50 are shown as -9999. I want to do 2 things:
apply(temp[, 5:50], 1 , function(x) mean(x[x != -9999])) # for mean
apply(temp[, 5:50], 1 , function(x) sum(x[x != -9999])) # for number of values not equal to -9999
I am learning data.table
now so wondered how to implement the same in data.table. I achieved this:
temp[, .(Mean = rowMeans(.SD)), by = c('ID1','ID2','ID3','ID4')]
How do I exclude -9999 and also calculate the number of data points per row without including -9999?
Upvotes: 1
Views: 463
Reputation: 33417
I'd suggest replacing -9999
with NA
and then using na.rm = TRUE
for rowMeans
:
library(data.table)
temp <- data.table(replicate(4, rep("charVar", 640)), replicate(46, sample(c(0:100, -9999), 640, rep = TRUE)))
for (j in 5:50){set(temp, which(temp[[j]] == -9999), j, NA)}
temp[, .(Mean = rowMeans(.SD, na.rm = TRUE), Count = rowSums(!is.na(.SD))), .SDcols=c(5:50)]
# If you want to add the new columns to the existing data.table use:
# temp[, c("Mean", "Count") := .(rowMeans(.SD, na.rm = TRUE), rowSums(!is.na(.SD))), .SDcols=c(5:50)]
Upvotes: 2