Gordon Ji
Gordon Ji

Reputation: 13

Selectively Combine R Dataframe through Index

I need to find a way to selectively sum rows by a index. Here is an example r dataset:

Year  Category  Value  Index 
2002   A         100     1 
2002   B         200     1
2002   C         300     1
2002   A         400     2
2002   B         700     2
2002   C         900     2
2002   A         600     1  
2002   B         500     1
2002   C         400     1

2003   A         100     3 
2003   B         200     3
2003   C         300     3
2003   A         400     2
2003   B         700     2
2003   C         900     2
2003   A         600     2  
2003   B         500     2
2003   C         400     2

I'm hoping to get:

Year  Category  Value  Index 
2002   A         700     1 
2002   B         700     1
2002   C         700     1
2002   A         400     2
2002   B         700     2
2002   C         900     2

2003   A         100     3 
2003   B         200     3
2003   C         300     3
2003   A         1000    2
2003   B         1200    2
2003   C         1300    2

Basically, they will only sum up with the same index by category with different years. Is there an efficient way to do that instead of looping around? I got a long list of index here so I think looping around will be a bad idea.

Upvotes: 1

Views: 36

Answers (2)

r2evans
r2evans

Reputation: 160942

Base R:

aggregate(a$Value, a[c("Year", "Category", "Index")], sum)
#    Year Category Index    x
# 1  2002        A     1  700
# 2  2002        B     1  700
# 3  2002        C     1  700
# 4  2002        A     2  400
# 5  2003        A     2 1000
# 6  2002        B     2  700
# 7  2003        B     2 1200
# 8  2002        C     2  900
# 9  2003        C     2 1300
# 10 2003        A     3  100
# 11 2003        B     3  200
# 12 2003        C     3  300

I'm particularly fond of the formula interface, I find it more readable:

aggregate(Value ~ Year + Category + Index, a, sum)
aggregate(. ~ Year + Category + Index, a, sum) # multiple columns to be summed

(This is all shown in the link provided by @CalumYou.)


Data:

a <- read.table(header=TRUE, text='
Year  Category  Value  Index 
2002   A         100     1 
2002   B         200     1
2002   C         300     1
2002   A         400     2
2002   B         700     2
2002   C         900     2
2002   A         600     1  
2002   B         500     1
2002   C         400     1
2003   A         100     3 
2003   B         200     3
2003   C         300     3
2003   A         400     2
2003   B         700     2
2003   C         900     2
2003   A         600     2  
2003   B         500     2
2003   C         400     2')

Upvotes: 2

Calum You
Calum You

Reputation: 15072

You can use group_by and summarise. I am not sure your given desired output is exactly correct, unless I misunderstood the question.

library(tidyverse)
tbl <- structure(list(Year = c(2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L), Category = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"), Value = c(100L, 200L, 300L, 400L, 700L, 900L, 600L, 500L, 400L, 100L, 200L, 300L, 400L, 700L, 900L, 600L, 500L, 400L), Index = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L)), row.names = c(NA, -18L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(Year = structure(list(), class = c("collector_integer", "collector")), Category = structure(list(), class = c("collector_character", "collector")), Value = structure(list(), class = c("collector_integer", "collector")), Index = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))

tbl %>%
  group_by(Year, Category, Index) %>%
  summarise(Value = sum(Value)) %>%
  arrange(Year, Index, Category)
#> # A tibble: 12 x 4
#> # Groups:   Year, Category [6]
#>     Year Category Index Value
#>    <int> <chr>    <int> <int>
#>  1  2002 A            1   700
#>  2  2002 B            1   700
#>  3  2002 C            1   700
#>  4  2002 A            2   400
#>  5  2002 B            2   700
#>  6  2002 C            2   900
#>  7  2003 A            2  1000
#>  8  2003 B            2  1200
#>  9  2003 C            2  1300
#> 10  2003 A            3   100
#> 11  2003 B            3   200
#> 12  2003 C            3   300

Created on 2018-08-07 by the reprex package (v0.2.0).

Upvotes: 2

Related Questions