Reputation: 584
I need to aggregate the previous 5 years of the N_C variable in each row.
For example: year 2017 - Sum_Five_Years = 10(2017)+21(2015)+14(2014)+16(2013) = 61
Data:
library(dplyr)
DF<-data.frame(company = c("DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM"),
year= c("2017","2015","2015","2015","2013","2012","2012","2012","2010","2010","2015","2014","2014","2013","2013","2012"),
N_C= c("0","7","5","4","3","24","52","99","43","37","5","7","7","4","9","20"), Sum_Year = c("0","21","21","21","16","195","195","195","80","80","21","14","14","16","16","195"))
DF <- DF %>% arrange(year)
company year N_C Sum_Year
1 DEL MAR PHARM 2010 43 80
2 DEL MAR PHARM 2010 37 80
3 DEL MAR PHARM 2012 24 195
4 DEL MAR PHARM 2012 52 195
5 DEL MAR PHARM 2012 99 195
6 DEL MAR PHARM 2012 20 195
7 DEL MAR PHARM 2013 3 16
8 DEL MAR PHARM 2013 4 16
9 DEL MAR PHARM 2013 9 16
10 DEL MAR PHARM 2014 7 14
11 DEL MAR PHARM 2014 7 14
12 DEL MAR PHARM 2015 7 21
13 DEL MAR PHARM 2015 5 21
14 DEL MAR PHARM 2015 4 21
15 DEL MAR PHARM 2015 5 21
16 DEL MAR PHARM 2017 10 10
Expected Outcome
DF$Sum_Five_Year <- cbind(c("80","80","275","275","275","275","291","291","291","305","305","246","246","246","246","61"))
> DF
company year N_C Sum_Year Sum_Five_Year
1 DEL MAR PHARM 2010 43 80 80
2 DEL MAR PHARM 2010 37 80 80
3 DEL MAR PHARM 2012 24 195 275
4 DEL MAR PHARM 2012 52 195 275
5 DEL MAR PHARM 2012 99 195 275
6 DEL MAR PHARM 2012 20 195 275
7 DEL MAR PHARM 2013 3 16 291
8 DEL MAR PHARM 2013 4 16 291
9 DEL MAR PHARM 2013 9 16 291
10 DEL MAR PHARM 2014 7 14 305
11 DEL MAR PHARM 2014 7 14 305
12 DEL MAR PHARM 2015 7 21 246
13 DEL MAR PHARM 2015 5 21 246
14 DEL MAR PHARM 2015 4 21 246
15 DEL MAR PHARM 2015 5 21 246
16 DEL MAR PHARM 2017 10 10 61
I have tried the following code but it does not work:
library(data.table)
setDT(DF)
DF[, `:=` (Sum_Five_Year= sum(N_C)), by= list(company,cut(year, breaks = c(5), right = F))]
Any suggestion would be very appreciated :)
Upvotes: 1
Views: 182
Reputation: 14764
With no additional packages, you could use sapply
.
The code below assumes that Sum_Year
has already been created. You could apply the following directly to your example:
distinct(DF, company, year, Sum_Year) %>%
group_by(company) %>%
mutate(
year = as.integer(as.character(year)),
Sum_Five_Year = sapply(year, function(x) sum(Sum_Year[between(year, x - 5 + 1, x)]))
) %>%
left_join(DF %>% select(-Sum_Year), by = c("company", "year"))
Output:
# A tibble: 16 x 5
# Groups: company [?]
company year Sum_Year Sum_Five_Year N_C
<chr> <int> <int> <int> <int>
1 DELMARPHARM 2010 80 80 43
2 DELMARPHARM 2010 80 80 37
3 DELMARPHARM 2012 195 275 24
4 DELMARPHARM 2012 195 275 52
5 DELMARPHARM 2012 195 275 99
6 DELMARPHARM 2012 195 275 20
7 DELMARPHARM 2013 16 291 3
8 DELMARPHARM 2013 16 291 4
9 DELMARPHARM 2013 16 291 9
10 DELMARPHARM 2014 14 305 7
11 DELMARPHARM 2014 14 305 7
12 DELMARPHARM 2015 21 246 7
13 DELMARPHARM 2015 21 246 5
14 DELMARPHARM 2015 21 246 4
15 DELMARPHARM 2015 21 246 5
16 DELMARPHARM 2017 10 61 10
Otherwise you can do:
DF %>%
group_by(company, year) %>%
mutate(N_C = as.numeric(as.character(N_C))) %>%
summarise(Sum_Year = sum(N_C)) %>%
mutate(
year = as.integer(as.character(year)),
Sum_Five_Year = sapply(year, function(x) sum(Sum_Year[between(year, x - 5 + 1, x)]))
) %>%
left_join(DF %>% select(-Sum_Year), by = c("company", "year"))
If you'd like to get rid of the duplicated format, just leave out the join at the end:
DF %>%
group_by(company, year) %>%
mutate(N_C = as.numeric(as.character(N_C))) %>%
summarise(Sum_Year = sum(N_C)) %>%
mutate(
year = as.integer(as.character(year)),
Sum_Five_Year = sapply(year, function(x) sum(Sum_Year[between(year, x - 5 + 1, x)]))
)
Output:
# A tibble: 6 x 4
# Groups: company [1]
company year Sum_Year Sum_Five_Year
<chr> <int> <dbl> <dbl>
1 DELMARPHARM 2010 80 80
2 DELMARPHARM 2012 195 275
3 DELMARPHARM 2013 16 291
4 DELMARPHARM 2014 14 305
5 DELMARPHARM 2015 21 246
6 DELMARPHARM 2017 10 61
Upvotes: 4