monkeybiz7
monkeybiz7

Reputation: 5128

Computing subset of column means in data frame (R programming)

I have a simple data frame:

a=data.frame(first=c(1,2,3),second=c(3,4,5),third=c('x','y','z'))

I'm trying to return a data frame that contains the column means for just the first and second columns. I've been doing it like this:

apply(a[,c('first','second')],2,mean)

Which returns the appropriate output:

first second 
     2      4 

However, I want to know if I can do it using the function by. I tried this:

by(a, c("first", "second"), mean)

Which resulted in:

Error in tapply(seq_len(3L), list(`c("first", "second")` = c("first",  : 
  arguments must have same length

Then, I tried this:

by(a, c(T, T,F), mean)

Which also did not yield the correct answer:

c(T,T,F): FALSE
[1] NA

Any suggestions? Thanks!

Upvotes: 0

Views: 2518

Answers (2)

Rich Scriven
Rich Scriven

Reputation: 99331

You can use colMeans (column means) on a subset of the original data

> a <- data.frame(first = c(1,2,3), second = c(3,4,5), third = c('x','y','z'))

If you know the column number, but not the column name,

> colMeans(a[, 1:2])
## first second 
##     2      4 

Or, if you don't know the column numbers but know the column name,

> colMeans(a[, c("first", "second")])
## first second 
##     2      4 

Finally, if you know nothing about the columns and want the means for the numeric columns only,

> colMeans(a[, sapply(a, is.numeric)])
## first second 
##     2      4 

Upvotes: 2

user14382
user14382

Reputation: 969

by() is not the right tool, because it is a wrapper for tapply(), which partitions your data frame into subsets that meet some criteria. If you had another column, say fourth, you could split your data frame using by() for that column and then operate on rows or columns using apply().

Upvotes: 0

Related Questions