Rfanatic
Rfanatic

Reputation: 2282

How to calculate means when you have missing values?

I would like to calculate the mean of the data frame that has some missing values. The sum of the data frame is 500 and the number of cells is 28. therefore the mean should be 17.8571. However, when calculating in R I need to mark the missing cells with 0 that changes the mean value

enter image description here

enter image description here

Sample data:

    df<-structure(list(`10` = c(10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10), `20` = c(20, 20, 20, 20, 20, 20, 20, 20, NA, 
NA, NA, NA, NA, NA), `30` = c(30, 30, 30, 30, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA), `40` = c(40, 40, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -14L), class = c("tbl_df", 
"tbl", "data.frame"))

Sample code:

Where is my mistake?

df1<-rowMeans(df, na.rm=TRUE) # I also tried colMeans

df2<-mean(df1)

Upvotes: 0

Views: 2510

Answers (3)

GKi
GKi

Reputation: 39717

You can convert your data.frame to a vector using unlist and calculate then the mean with the argument na.rm=TRUE to skip NA.

mean(unlist(df), na.rm=TRUE)
#[1] 17.85714

Another option is to convert the data.frame to a matrix.

mean(as.matrix(df), na.rm=TRUE)
#[1] 17.85714

Upvotes: 2

Frank Zhang
Frank Zhang

Reputation: 1688

sum(df,na.rm = TRUE)/sum(!is.na(df))

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 389175

To match mean with excel you can repeat the time value df number of times.

mean(rep(df$time, df$df))
#[1] 17.85714

Upvotes: 1

Related Questions