epi_n00b
epi_n00b

Reputation: 150

R: Find average of multiple columns, only for the columns where the value > 0

I have four columns with patients' BMI measurements. Some patients have more measurements than others. Example:

    id <- c(1, 2, 3, 4, 5)
    bmi1 <- c(18, 25, 20, 30, 32)
    bmi2 <- c(18, 0, 0, 31, 34)
    bmi3 <- c(20, 0, 0, 0, 31)
    bmi4 <- c(0, 0, 0, 0, 32)
    bmi <- data.frame(id, bmi1, bmi2, bmi3, bmi4)

I want to create a fifth column, avg that averages the four bmi columns [2:5] together, but leaves out the zeros. So it would look like this:

      id bmi1 bmi2 bmi3 bmi4   avg
       1   18   18   20    0 18.67
       2   25    0    0    0 25.00
       3   20    0    0    0 20.00
       4   30   31    0    0 30.50
       5   32   34   31   32 32.50

So to take the average of row 2, it would simply calculate (25/1), but to take the average of row 1, it would calculate (18+18+20)/3. The IDs need to still be intact.

I've worked some with data table and would be happy for either a data frame or a data table solution.

Upvotes: 1

Views: 745

Answers (4)

count
count

Reputation: 1338

 `%notin%` <- function(x,y) !(x %in% y)
  bmi$avg <- apply(bmi[2:4],1, function(x) sum(x) / length(which(bmi[x,1] %notin% 0)))

This might also work.

Upvotes: 0

akrun
akrun

Reputation: 887691

We convert the '0' values to 'NA' in columns 2:5 and then use rowMeans with na.rm=TRUE.

 bmi[2:5][bmi[2:5]==0]<- NA

Or as @David Arenburg mentioned

 is.na(bmi[-1]) <- bmi[-1] == 0 #changes 0 values to NA
 bmi$avg <- round(rowMeans(bmi[2:5], na.rm=TRUE),2)
 bmi$avg
 #[1] 18.67 25.00 20.00 30.50 32.25

The above solution changes the '0' values to 'NA' in the original dataset, but if we don't want to change the values in the original 'bmi' object, then use replace with rowMeans.

 rowMeans(replace(bmi[2:5], bmi[2:5]==0, NA), na.rm=TRUE)

Upvotes: 4

David Arenburg
David Arenburg

Reputation: 92300

Here's another possibility

round(rowSums(bmi[-1]) / rowSums(bmi[-1] != 0), 2)
# [1] 18.67 25.00 20.00 30.50 32.25

This is just divides the row sums by the number of non zero values per row.


Or (if you don't have NAs) if we want to avoid matrix conversion

round(Reduce(`+`, bmi[-1]) / rowSums(bmi[-1] != 0), 2)

Upvotes: 2

scoa
scoa

Reputation: 19857

Here is a base-r solution with apply:

bmi$avg=apply(bmi[,2:4],1,function(x) mean(x[x !=0 ]))

Upvotes: 2

Related Questions