Kuan Hoong
Kuan Hoong

Reputation: 143

calculate mean for multiple columns in data.frame

Just wondering whether it is possible to calculate means for multiple columns by just using the mean function



is possible but not




got this error message:

Warning message: In mean.default(iris[, 1:4]) : argument is not numeric or logical: returning NA

I know I can just use lapply(iris[,1:4],mean) or sapply(iris[,1:4],mean)

Upvotes: 12

Views: 85221

Answers (3)

Carlos Cinelli
Carlos Cinelli

Reputation: 11617

With sapply + Filter:

sapply(Filter(is.numeric, iris), mean)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    5.843333     3.057333     3.758000     1.199333 

With dplyr:

iris %>% summarise_each(funs(mean))
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1:     5.843333    3.057333        3.758    1.199333      NA

PS: in dplyr you can now use summarize_if,

iris %>% summarise_if(is.numeric, mean)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1     5.843333    3.057333        3.758    1.199333

With data.table:

iris <- data.table(iris)
iris[,lapply(.SD, mean)]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1:     5.843333    3.057333        3.758    1.199333      NA

Upvotes: 8

Pierre L
Pierre L

Reputation: 28461

Try colMeans:

But the column must be numeric. You can add a test for it for larger datasets.

colMeans(iris[sapply(iris, is.numeric)])
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
    5.843333     3.057333     3.758000     1.199333 


Seems long for dplyr and data.table. Perhaps someone can replicate the findings for veracity.

  plafort = colMeans(big.df[sapply(big.df, is.numeric)]),
  Carlos  = colMeans(Filter(is.numeric, big.df)),
  Cdtable = big.dt[, lapply(.SD, mean)],
  Cdplyr  = big.df %>% summarise_each(funs(mean))
#Unit: milliseconds
#    expr       min        lq     mean    median       uq       max
# plafort  9.862934 10.506778 12.07027 10.699616 11.16404  31.23927
#  Carlos  9.215143  9.557987 11.30063  9.843197 10.21821  65.21379
# Cdtable 57.157250 64.866996 78.72452 67.633433 87.52451 264.60453
#  Cdplyr 62.933293 67.853312 81.77382 71.296555 91.44994 182.36578


m <- matrix(1:1e6, 1000)
m2 <- matrix(rep('a', 1000), ncol=1)
big.df <- as.data.frame(cbind(m2, m), stringsAsFactors=F)
big.df[,-1] <- lapply(big.df[,-1], as.numeric)
big.dt <- as.data.table(big.df)

Upvotes: 13


Reputation: 312

Your above solution does work assuming the columns are in the correct is.numeric format. See below example:

a <- c(1,2,3)

b <- c(2,4,6)

d <- c(3,6,9)

mydata <- cbind(b,a,d)


Upvotes: 0

Related Questions