Reputation: 2282
I would like to calculate the mean of the data frame that has some missing values. The sum of the data frame is 500 and the number of cells is 28. therefore the mean should be 17.8571. However, when calculating in R I need to mark the missing cells with 0 that changes the mean value
Sample data:
df<-structure(list(`10` = c(10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10), `20` = c(20, 20, 20, 20, 20, 20, 20, 20, NA,
NA, NA, NA, NA, NA), `30` = c(30, 30, 30, 30, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA), `40` = c(40, 40, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -14L), class = c("tbl_df",
"tbl", "data.frame"))
Sample code:
Where is my mistake?
df1<-rowMeans(df, na.rm=TRUE) # I also tried colMeans
df2<-mean(df1)
Upvotes: 0
Views: 2510
Reputation: 39717
You can convert your data.frame
to a vector using unlist
and calculate then the mean
with the argument na.rm=TRUE
to skip NA
.
mean(unlist(df), na.rm=TRUE)
#[1] 17.85714
Another option is to convert the data.frame
to a matrix.
mean(as.matrix(df), na.rm=TRUE)
#[1] 17.85714
Upvotes: 2
Reputation: 389175
To match mean
with excel you can repeat the time
value df
number of times.
mean(rep(df$time, df$df))
#[1] 17.85714
Upvotes: 1