Reputation: 7928
I'm trying to get the sum of a numerical variable per a categorical variable (in a data frame). I've tried using tapply
, but it's doesn't take a whole data.frame
.
Here is a working example with some data that looks like this:
> set.seed(667)
> df <- data.frame(a = sample(c("Group A","Group B","Group C",NA), 10, rep = TRUE),
b = sample(c(1, 2, 3, 4, 5, 6), 10, rep=TRUE),
c = sample(c(11, 12, 13, 14, 15, 16), 10, rep=TRUE))
> df
a b c
1 Group A 4 12
2 Group B 6 12
3 <NA> 4 14
4 Group C 1 16
5 <NA> 2 14
6 <NA> 3 13
7 Group C 4 13
8 <NA> 6 15
9 Group B 3 16
10 Group B 5 16
using tapply
, I can get one vector at a time:
> tapply(df$b,df$a,sum)
Group A Group B Group C
4 14 5
but I am more interested in getting something like this:
a b c
1 Group A 4 12
2 Group B 14 44
3 Group C 5 29
Any help would be appreciated. Thanks.
Upvotes: 2
Views: 429
Reputation: 263332
Use aggregate instead:
aggregate(df[ , c("b","c")], df['a'], FUN=sum)
a b c
1 Group A 4 12
2 Group B 14 44
3 Group C 5 29
I'm not sure why but you need to pass the second argument to aggregate as a list, so using df$a will error out. It then uses the function on the individual columns in the first argument.
Upvotes: 4