Amleto
Amleto

Reputation: 584

Sum of previous five years

I need to aggregate the previous 5 years of the N_C variable in each row.

For example: year 2017 - Sum_Five_Years = 10(2017)+21(2015)+14(2014)+16(2013) = 61

Data:

library(dplyr) 
DF<-data.frame(company = c("DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM","DEL MAR PHARM"),
           year= c("2017","2015","2015","2015","2013","2012","2012","2012","2010","2010","2015","2014","2014","2013","2013","2012"),
           N_C=   c("0","7","5","4","3","24","52","99","43","37","5","7","7","4","9","20"), Sum_Year = c("0","21","21","21","16","195","195","195","80","80","21","14","14","16","16","195"))
DF <- DF %>% arrange(year)

         company year N_C Sum_Year
1  DEL MAR PHARM 2010  43       80
2  DEL MAR PHARM 2010  37       80
3  DEL MAR PHARM 2012  24      195
4  DEL MAR PHARM 2012  52      195
5  DEL MAR PHARM 2012  99      195
6  DEL MAR PHARM 2012  20      195
7  DEL MAR PHARM 2013   3       16
8  DEL MAR PHARM 2013   4       16
9  DEL MAR PHARM 2013   9       16
10 DEL MAR PHARM 2014   7       14
11 DEL MAR PHARM 2014   7       14
12 DEL MAR PHARM 2015   7       21
13 DEL MAR PHARM 2015   5       21
14 DEL MAR PHARM 2015   4       21
15 DEL MAR PHARM 2015   5       21
16 DEL MAR PHARM 2017  10       10

Expected Outcome

DF$Sum_Five_Year <- cbind(c("80","80","275","275","275","275","291","291","291","305","305","246","246","246","246","61"))

> DF
         company year N_C Sum_Year Sum_Five_Year
1  DEL MAR PHARM 2010  43       80            80
2  DEL MAR PHARM 2010  37       80            80
3  DEL MAR PHARM 2012  24      195           275
4  DEL MAR PHARM 2012  52      195           275
5  DEL MAR PHARM 2012  99      195           275
6  DEL MAR PHARM 2012  20      195           275
7  DEL MAR PHARM 2013   3       16           291
8  DEL MAR PHARM 2013   4       16           291
9  DEL MAR PHARM 2013   9       16           291
10 DEL MAR PHARM 2014   7       14           305
11 DEL MAR PHARM 2014   7       14           305
12 DEL MAR PHARM 2015   7       21           246
13 DEL MAR PHARM 2015   5       21           246
14 DEL MAR PHARM 2015   4       21           246
15 DEL MAR PHARM 2015   5       21           246
16 DEL MAR PHARM 2017  10       10            61

I have tried the following code but it does not work:

 library(data.table)
 setDT(DF)  
 DF[, `:=` (Sum_Five_Year= sum(N_C)), by= list(company,cut(year, breaks = c(5), right = F))]

Any suggestion would be very appreciated :)

Upvotes: 1

Views: 182

Answers (1)

arg0naut91
arg0naut91

Reputation: 14764

With no additional packages, you could use sapply.

The code below assumes that Sum_Year has already been created. You could apply the following directly to your example:

distinct(DF, company, year, Sum_Year) %>%
  group_by(company) %>%
  mutate(
    year = as.integer(as.character(year)),
    Sum_Five_Year = sapply(year, function(x) sum(Sum_Year[between(year, x - 5 + 1, x)]))
  ) %>%
  left_join(DF %>% select(-Sum_Year), by = c("company", "year"))

Output:

# A tibble: 16 x 5
# Groups:   company [?]
   company      year Sum_Year Sum_Five_Year   N_C
   <chr>       <int>    <int>         <int> <int>
 1 DELMARPHARM  2010       80            80    43
 2 DELMARPHARM  2010       80            80    37
 3 DELMARPHARM  2012      195           275    24
 4 DELMARPHARM  2012      195           275    52
 5 DELMARPHARM  2012      195           275    99
 6 DELMARPHARM  2012      195           275    20
 7 DELMARPHARM  2013       16           291     3
 8 DELMARPHARM  2013       16           291     4
 9 DELMARPHARM  2013       16           291     9
10 DELMARPHARM  2014       14           305     7
11 DELMARPHARM  2014       14           305     7
12 DELMARPHARM  2015       21           246     7
13 DELMARPHARM  2015       21           246     5
14 DELMARPHARM  2015       21           246     4
15 DELMARPHARM  2015       21           246     5
16 DELMARPHARM  2017       10            61    10

Otherwise you can do:

DF %>%
  group_by(company, year) %>%
  mutate(N_C = as.numeric(as.character(N_C))) %>%
  summarise(Sum_Year = sum(N_C)) %>%
  mutate(
    year = as.integer(as.character(year)),
    Sum_Five_Year = sapply(year, function(x) sum(Sum_Year[between(year, x - 5 + 1, x)]))
    ) %>%
  left_join(DF %>% select(-Sum_Year), by = c("company", "year"))

If you'd like to get rid of the duplicated format, just leave out the join at the end:

DF %>%
  group_by(company, year) %>%
  mutate(N_C = as.numeric(as.character(N_C))) %>%
  summarise(Sum_Year = sum(N_C)) %>%
  mutate(
    year = as.integer(as.character(year)),
    Sum_Five_Year = sapply(year, function(x) sum(Sum_Year[between(year, x - 5 + 1, x)]))
    )

Output:

# A tibble: 6 x 4
# Groups:   company [1]
  company      year Sum_Year Sum_Five_Year
  <chr>       <int>    <dbl>         <dbl>
1 DELMARPHARM  2010       80            80
2 DELMARPHARM  2012      195           275
3 DELMARPHARM  2013       16           291
4 DELMARPHARM  2014       14           305
5 DELMARPHARM  2015       21           246
6 DELMARPHARM  2017       10            61

Upvotes: 4

Related Questions