Reputation: 5128
I have a simple data frame:
a=data.frame(first=c(1,2,3),second=c(3,4,5),third=c('x','y','z'))
I'm trying to return a data frame that contains the column means for just the first and second columns. I've been doing it like this:
apply(a[,c('first','second')],2,mean)
Which returns the appropriate output:
first second
2 4
However, I want to know if I can do it using the function by
. I tried this:
by(a, c("first", "second"), mean)
Which resulted in:
Error in tapply(seq_len(3L), list(`c("first", "second")` = c("first", :
arguments must have same length
Then, I tried this:
by(a, c(T, T,F), mean)
Which also did not yield the correct answer:
c(T,T,F): FALSE
[1] NA
Any suggestions? Thanks!
Upvotes: 0
Views: 2518
Reputation: 99331
You can use colMeans
(column means) on a subset of the original data
> a <- data.frame(first = c(1,2,3), second = c(3,4,5), third = c('x','y','z'))
If you know the column number, but not the column name,
> colMeans(a[, 1:2])
## first second
## 2 4
Or, if you don't know the column numbers but know the column name,
> colMeans(a[, c("first", "second")])
## first second
## 2 4
Finally, if you know nothing about the columns and want the means for the numeric columns only,
> colMeans(a[, sapply(a, is.numeric)])
## first second
## 2 4
Upvotes: 2
Reputation: 969
by() is not the right tool, because it is a wrapper for tapply(), which partitions your data frame into subsets that meet some criteria. If you had another column, say fourth, you could split your data frame using by() for that column and then operate on rows or columns using apply().
Upvotes: 0