Reputation: 111
Say I have the following dataframe, initialized as df:
ID date value
A 01/2012 1
A 03/2012 2
A 05/2012 4
A 07/2012 3
A 09/2012 7
A 11/2012 1
A 01/2013 2
A 03/2013 8
A 05/2013 13
A 07/2013 2
A 09/2013 5
A 11/2013 2
B 01/2012 3
B 03/2012 9
B 05/2012 1
B 07/2012 0
B 09/2012 12
B 11/2012 3
B 01/2013 1
B 03/2013 4
B 05/2013 3
B 07/2013 3
B 09/2013 1
B 11/2013 1
Where the date variable is of the form mm/yyyy. I am looking for a way to create an additional column in this data frame that gives cumulative sum of the value column grouped by ID and year. For example, the following output is what I want:
ID date value cumsum
A 01/2012 1 1
A 03/2012 2 3
A 05/2012 4 7
A 07/2012 3 10
A 09/2012 7 17
A 11/2012 1 18
A 01/2013 2 2
A 03/2013 8 10
A 05/2013 13 23
A 07/2013 2 25
A 09/2013 5 30
A 11/2013 2 32
B 01/2012 3 3
B 03/2012 9 12
B 05/2012 1 13
B 07/2012 0 13
B 09/2012 12 25
B 11/2012 3 28
B 01/2013 1 1
B 03/2013 4 5
B 05/2013 3 8
B 07/2013 3 11
B 09/2013 1 12
B 11/2013 1 13
As you can see, the sum resets for each year but also for each ID. Essentially I am unsure of how to create a cumulative sum that is grouped by 2 columns instead of just 1. Any help would be appreciated
Upvotes: 0
Views: 277
Reputation: 389335
For completion here is a base R option -
transform(df, cumsum = ave(value, ID, sub('.*/', '', date), FUN = cumsum))
# ID date value cumsum
#1 A 01/2012 1 1
#2 A 03/2012 2 3
#3 A 05/2012 4 7
#4 A 07/2012 3 10
#5 A 09/2012 7 17
#6 A 11/2012 1 18
#7 A 01/2013 2 2
#8 A 03/2013 8 10
#9 A 05/2013 13 23
#10 A 07/2013 2 25
#11 A 09/2013 5 30
#12 A 11/2013 2 32
#13 B 01/2012 3 3
#14 B 03/2012 9 12
#15 B 05/2012 1 13
#16 B 07/2012 0 13
#17 B 09/2012 12 25
#18 B 11/2012 3 28
#19 B 01/2013 1 1
#20 B 03/2013 4 5
#21 B 05/2013 3 8
#22 B 07/2013 3 11
#23 B 09/2013 1 12
#24 B 11/2013 1 13
Upvotes: 1
Reputation: 2419
Optional solution:
library(data.tabe)
setDT(df)[,cumsum:=cumsum(value),by=.(ID,substr(date, 4,8))][]
Upvotes: 2
Reputation: 887981
We convert to Date
class and extract the year
part to create grouping column before we do the cumulative sum
library(dplyr)
library(lubridate)
df1 %>%
group_by(ID, year = year(my(date))) %>%
mutate(cumsum = cumsum(value)) %>%
ungroup %>%
select(-year)
# A tibble: 24 x 4
ID date value cumsum
<chr> <chr> <int> <int>
1 A 01/2012 1 1
2 A 03/2012 2 3
3 A 05/2012 4 7
4 A 07/2012 3 10
5 A 09/2012 7 17
6 A 11/2012 1 18
7 A 01/2013 2 2
8 A 03/2013 8 10
9 A 05/2013 13 23
10 A 07/2013 2 25
# … with 14 more rows
df1 <- structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B"), date = c("01/2012", "03/2012", "05/2012", "07/2012",
"09/2012", "11/2012", "01/2013", "03/2013", "05/2013", "07/2013",
"09/2013", "11/2013", "01/2012", "03/2012", "05/2012", "07/2012",
"09/2012", "11/2012", "01/2013", "03/2013", "05/2013", "07/2013",
"09/2013", "11/2013"), value = c(1L, 2L, 4L, 3L, 7L, 1L, 2L,
8L, 13L, 2L, 5L, 2L, 3L, 9L, 1L, 0L, 12L, 3L, 1L, 4L, 3L, 3L,
1L, 1L)), class = "data.frame", row.names = c(NA, -24L))
Upvotes: 0
Reputation: 79358
You could do:
df %>%
group_by(ID, year = substr(date, 4,8)) %>%
mutate(cumsum = cumsum(value))
# A tibble: 24 x 5
# Groups: ID, year [4]
ID date value year cumsum
<chr> <chr> <int> <chr> <int>
1 A 01/2012 1 2012 1
2 A 03/2012 2 2012 3
3 A 05/2012 4 2012 7
4 A 07/2012 3 2012 10
5 A 09/2012 7 2012 17
6 A 11/2012 1 2012 18
7 A 01/2013 2 2013 2
8 A 03/2013 8 2013 10
9 A 05/2013 13 2013 23
10 A 07/2013 2 2013 25
Upvotes: 0