Computing subset of column means in data frame (R programming)

Question

I have a simple data frame:

a=data.frame(first=c(1,2,3),second=c(3,4,5),third=c('x','y','z'))

I'm trying to return a data frame that contains the column means for just the first and second columns. I've been doing it like this:

apply(a[,c('first','second')],2,mean)

Which returns the appropriate output:

first second 
     2      4

However, I want to know if I can do it using the function by. I tried this:

by(a, c("first", "second"), mean)

Which resulted in:

Error in tapply(seq_len(3L), list(`c("first", "second")` = c("first",  : 
  arguments must have same length

Then, I tried this:

by(a, c(T, T,F), mean)

Which also did not yield the correct answer:

c(T,T,F): FALSE
[1] NA

Any suggestions? Thanks!

Rich Scriven · Accepted Answer

You can use colMeans (column means) on a subset of the original data

> a <- data.frame(first = c(1,2,3), second = c(3,4,5), third = c('x','y','z'))

If you know the column number, but not the column name,

> colMeans(a[, 1:2])
## first second 
##     2      4

Or, if you don't know the column numbers but know the column name,

> colMeans(a[, c("first", "second")])
## first second 
##     2      4

Finally, if you know nothing about the columns and want the means for the numeric columns only,

> colMeans(a[, sapply(a, is.numeric)])
## first second 
##     2      4

Answers (2)