Selectively Combine R Dataframe through Index

Question

I need to find a way to selectively sum rows by a index. Here is an example r dataset:

Year  Category  Value  Index 
2002   A         100     1 
2002   B         200     1
2002   C         300     1
2002   A         400     2
2002   B         700     2
2002   C         900     2
2002   A         600     1  
2002   B         500     1
2002   C         400     1

2003   A         100     3 
2003   B         200     3
2003   C         300     3
2003   A         400     2
2003   B         700     2
2003   C         900     2
2003   A         600     2  
2003   B         500     2
2003   C         400     2

I'm hoping to get:

Year  Category  Value  Index 
2002   A         700     1 
2002   B         700     1
2002   C         700     1
2002   A         400     2
2002   B         700     2
2002   C         900     2

2003   A         100     3 
2003   B         200     3
2003   C         300     3
2003   A         1000    2
2003   B         1200    2
2003   C         1300    2

Basically, they will only sum up with the same index by category with different years. Is there an efficient way to do that instead of looping around? I got a long list of index here so I think looping around will be a bad idea.

r2evans · Accepted Answer

Base R:

aggregate(a$Value, a[c("Year", "Category", "Index")], sum)
#    Year Category Index    x
# 1  2002        A     1  700
# 2  2002        B     1  700
# 3  2002        C     1  700
# 4  2002        A     2  400
# 5  2003        A     2 1000
# 6  2002        B     2  700
# 7  2003        B     2 1200
# 8  2002        C     2  900
# 9  2003        C     2 1300
# 10 2003        A     3  100
# 11 2003        B     3  200
# 12 2003        C     3  300

I'm particularly fond of the formula interface, I find it more readable:

aggregate(Value ~ Year + Category + Index, a, sum)
aggregate(. ~ Year + Category + Index, a, sum) # multiple columns to be summed

(This is all shown in the link provided by @CalumYou.)

Data:

a <- read.table(header=TRUE, text='
Year  Category  Value  Index 
2002   A         100     1 
2002   B         200     1
2002   C         300     1
2002   A         400     2
2002   B         700     2
2002   C         900     2
2002   A         600     1  
2002   B         500     1
2002   C         400     1
2003   A         100     3 
2003   B         200     3
2003   C         300     3
2003   A         400     2
2003   B         700     2
2003   C         900     2
2003   A         600     2  
2003   B         500     2
2003   C         400     2')

Selectively Combine R Dataframe through Index

Answers (2)

Related Questions