Reputation: 105
I have a dataframe with columns of simulation data. I want the means of each column. The issue is some columns have a bunch of zeroes on the bottom and these need to be ignored.
I can ignore the zeroes and look at one column with
mean(which(df$colname >0))
But I want a vector of every column's mean, gotten with sapply. Is there a clean way to ignore the zeroes and get these values within a sapply function?
Or do I have to write a custom function and call that in the sapply?
Upvotes: 0
Views: 661
Reputation: 887691
We can use an one-liner in base R
colMeans(replace(df, !df, NA), na.rm = TRUE)
Or with dplyr
library(dplyr)
df %>%
summarise(across(everything(), ~ mean(na_if(., 0), na.rm = TRUE))
Upvotes: 0
Reputation: 389175
You can use :
sapply(df, function(x) mean(x[x != 0], na.rm = TRUE))
Or using dplyr
:
library(dplyr)
df %>% summarise_all(~mean(.[. != 0], na.rm = TRUE))
A better/efficient approach would be to set all 0 values to NA
and use colMeans
df[df == 0] <- NA
colMeans(df, na.rm = TRUE)
Upvotes: 2