Reputation: 13
I need to find a way to selectively sum rows by a index. Here is an example r dataset:
Year Category Value Index
2002 A 100 1
2002 B 200 1
2002 C 300 1
2002 A 400 2
2002 B 700 2
2002 C 900 2
2002 A 600 1
2002 B 500 1
2002 C 400 1
2003 A 100 3
2003 B 200 3
2003 C 300 3
2003 A 400 2
2003 B 700 2
2003 C 900 2
2003 A 600 2
2003 B 500 2
2003 C 400 2
I'm hoping to get:
Year Category Value Index
2002 A 700 1
2002 B 700 1
2002 C 700 1
2002 A 400 2
2002 B 700 2
2002 C 900 2
2003 A 100 3
2003 B 200 3
2003 C 300 3
2003 A 1000 2
2003 B 1200 2
2003 C 1300 2
Basically, they will only sum up with the same index by category with different years. Is there an efficient way to do that instead of looping around? I got a long list of index here so I think looping around will be a bad idea.
Upvotes: 1
Views: 36
Reputation: 160942
Base R:
aggregate(a$Value, a[c("Year", "Category", "Index")], sum)
# Year Category Index x
# 1 2002 A 1 700
# 2 2002 B 1 700
# 3 2002 C 1 700
# 4 2002 A 2 400
# 5 2003 A 2 1000
# 6 2002 B 2 700
# 7 2003 B 2 1200
# 8 2002 C 2 900
# 9 2003 C 2 1300
# 10 2003 A 3 100
# 11 2003 B 3 200
# 12 2003 C 3 300
I'm particularly fond of the formula interface, I find it more readable:
aggregate(Value ~ Year + Category + Index, a, sum)
aggregate(. ~ Year + Category + Index, a, sum) # multiple columns to be summed
(This is all shown in the link provided by @CalumYou.)
Data:
a <- read.table(header=TRUE, text='
Year Category Value Index
2002 A 100 1
2002 B 200 1
2002 C 300 1
2002 A 400 2
2002 B 700 2
2002 C 900 2
2002 A 600 1
2002 B 500 1
2002 C 400 1
2003 A 100 3
2003 B 200 3
2003 C 300 3
2003 A 400 2
2003 B 700 2
2003 C 900 2
2003 A 600 2
2003 B 500 2
2003 C 400 2')
Upvotes: 2
Reputation: 15072
You can use group_by
and summarise
. I am not sure your given desired output is exactly correct, unless I misunderstood the question.
library(tidyverse)
tbl <- structure(list(Year = c(2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L, 2003L), Category = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"), Value = c(100L, 200L, 300L, 400L, 700L, 900L, 600L, 500L, 400L, 100L, 200L, 300L, 400L, 700L, 900L, 600L, 500L, 400L), Index = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L)), row.names = c(NA, -18L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(Year = structure(list(), class = c("collector_integer", "collector")), Category = structure(list(), class = c("collector_character", "collector")), Value = structure(list(), class = c("collector_integer", "collector")), Index = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))
tbl %>%
group_by(Year, Category, Index) %>%
summarise(Value = sum(Value)) %>%
arrange(Year, Index, Category)
#> # A tibble: 12 x 4
#> # Groups: Year, Category [6]
#> Year Category Index Value
#> <int> <chr> <int> <int>
#> 1 2002 A 1 700
#> 2 2002 B 1 700
#> 3 2002 C 1 700
#> 4 2002 A 2 400
#> 5 2002 B 2 700
#> 6 2002 C 2 900
#> 7 2003 A 2 1000
#> 8 2003 B 2 1200
#> 9 2003 C 2 1300
#> 10 2003 A 3 100
#> 11 2003 B 3 200
#> 12 2003 C 3 300
Created on 2018-08-07 by the reprex package (v0.2.0).
Upvotes: 2