89_Simple
89_Simple

Reputation: 3805

Row mean and number of entries per row using data.table in R

I have a dataframe with 640 rows and 50 columns.

Columns 1 - 4 have some ID and character variables while 5 - 50 columns have the actual data. For each row, missing values in column 5:50 are shown as -9999. I want to do 2 things:

apply(temp[, 5:50], 1 , function(x) mean(x[x != -9999]))  # for mean
apply(temp[, 5:50], 1 , function(x) sum(x[x != -9999]))   # for number of values not equal to -9999

I am learning data.table now so wondered how to implement the same in data.table. I achieved this:

temp[, .(Mean = rowMeans(.SD)), by = c('ID1','ID2','ID3','ID4')]

How do I exclude -9999 and also calculate the number of data points per row without including -9999?

Upvotes: 1

Views: 463

Answers (1)

ismirsehregal
ismirsehregal

Reputation: 33417

I'd suggest replacing -9999 with NA and then using na.rm = TRUE for rowMeans:

library(data.table)
temp <- data.table(replicate(4, rep("charVar", 640)), replicate(46, sample(c(0:100, -9999), 640, rep = TRUE)))

for (j in 5:50){set(temp, which(temp[[j]] == -9999), j, NA)}
temp[, .(Mean = rowMeans(.SD, na.rm = TRUE), Count = rowSums(!is.na(.SD))), .SDcols=c(5:50)]

# If you want to add the new columns to the existing data.table use:
# temp[, c("Mean", "Count") := .(rowMeans(.SD, na.rm = TRUE), rowSums(!is.na(.SD))), .SDcols=c(5:50)]

Upvotes: 2

Related Questions