AnthonyH
AnthonyH

Reputation: 105

R get the means for all columns in a data frame while ignoring zeroes

I have a dataframe with columns of simulation data. I want the means of each column. The issue is some columns have a bunch of zeroes on the bottom and these need to be ignored.

I can ignore the zeroes and look at one column with

mean(which(df$colname >0))

But I want a vector of every column's mean, gotten with sapply. Is there a clean way to ignore the zeroes and get these values within a sapply function?

Or do I have to write a custom function and call that in the sapply?

Upvotes: 0

Views: 661

Answers (2)

akrun
akrun

Reputation: 887691

We can use an one-liner in base R

colMeans(replace(df, !df, NA), na.rm = TRUE)

Or with dplyr

library(dplyr)
df %>%
   summarise(across(everything(), ~ mean(na_if(., 0), na.rm = TRUE))

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389175

You can use :

sapply(df, function(x) mean(x[x != 0], na.rm = TRUE))

Or using dplyr :

library(dplyr)
df %>% summarise_all(~mean(.[. != 0], na.rm = TRUE))

A better/efficient approach would be to set all 0 values to NA and use colMeans

df[df == 0] <- NA
colMeans(df, na.rm = TRUE)

Upvotes: 2

Related Questions