anna
anna

Reputation: 1

How to sum a variable by group when I have more than two variables?

I have a similar problem to a previous question by another user How to sum a variable by group?, but I have more than two variables in my dataframe. It looks a little like this:

A   B    C      D        E 
1   m   1990    1989    200 
1   m   1990    1990    100
1   m   1991    1989    10 
2   m   1991    1990    20 
2   m   1991    1991    100
3   m   1992    1989    30 
3   m   1992    1990    20 
3   m   1992    1991    10
4   m   1992    1992    10 
4   m   1993    1989    50

I want to lose the variable D and sum up E for every same value in A, B and C, without losing the other variables. I tried the advice given in the link above (aggregate, by, etc) but I ended up with only two variables. I want something like this:

A    B   C      E
1   m   1990    300
1   m   1991    10
2   m   1991    120
3   m   1992    30
3   m   1992    30
4   m   1992    10
4   m   1993    50

Thank you in advance!

(This is my first question, so please let me know if it's inappropriate / missing something.)

Upvotes: 0

Views: 488

Answers (2)

NWaters
NWaters

Reputation: 1213

Check out the dplyr package. The solution would be somthing like :

library(dplyr)
data <- your_data
data_summed<- data %>% group_by(A, B, C) %>% mutate(F = sum(E))

dplyr's filter() can then be used to select only the columns of interest for your final data.frame.

For variations, check out this cheatsheet; its great.

Upvotes: 0

mattdevlin
mattdevlin

Reputation: 1095

I think aggregate(E ~ A + B + C, data=df, FUN=sum) should do the trick. This splits the data on columns A, B and C and computes the sum of E.

> aggregate(e ~ a+b+c, data=df, FUN=sum)

  a b    c   e
1 1 m 1990 300
2 1 m 1991  10
3 2 m 1991 120
4 3 m 1992  60
5 4 m 1992  10
6 4 m 1993  50

Upvotes: 0

Related Questions