Reputation: 3486
I have a data frame like this:
data.frame(
date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191,
16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"),
n= 1:14
)
How can I summarize n
by academic year? Each academic year should range from December to the next August. For example, I would like to sum up n
over each academic year. Refactoring manually is not an option because of too many values and sometimes even missing ones.
Eventually, the refactoring should look like:
date a.y.
"2012-05-01" 2011/2012
"2012-08-01" 2011/2012
"2012-12-01" 2012/2013
"2013-05-01" 2012/2013
"2013-08-01" 2012/2013
"2013-12-01" 2013/2014
"2014-05-01" 2013/2014
"2014-12-01" 2014/2015
"2015-05-01" 2014/2015
"2015-08-01" 2014/2015
"2015-12-01" 2015/2016
"2016-05-01" 2015/2016
"2016-08-01" 2015/2016
"2016-12-01" 2016/2017
As you can notice, the dates follow a similar pattern but each academic year might have a different number of dates.
Upvotes: 0
Views: 113
Reputation: 2806
If I read this right as soon as we see a December entry we change school years. If that is true then the following code will work.
library(data.table)
library(lubridate)
df = data.frame(
date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191,
16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"),
n= 1:14
)
df$AcademicYear = cumsum(month(df$date) == 12)
setDT(df)
df[ , .(Sum = sum(n)), by = .(AcademicYear)]
AcademicYear Sum
1: 0 3
2: 1 12
3: 2 13
4: 3 27
5: 4 36
6: 5 14
EDIT
for refactoring you could do something like this. It looks for a month by the AcademicYear, then depending on the month, it knows to add or subtract a year and paste it together. Then, the column just needs to be renamed and summed as above.
df[ , "AcademicYear2" := ifelse(any(month(date) == 5), paste(year(date[month(date) == 5]) - 1,year(date[month(date) == 5]), sep = "/"),
ifelse(any(month(date) == 8), paste(year(date[month(date) == 8]) - 1,year(date[month(date) == 8]), sep = "/"),
paste(year(date[month(date) == 12]),year(date[month(date) == 12]) + 1, sep = "/"))), by = .(AcademicYear)]
> df
date n AcademicYear AcademicYear2
1: 2012-05-01 1 0 2011/2012
2: 2012-08-01 2 0 2011/2012
3: 2012-12-01 3 1 2012/2013
4: 2013-05-01 4 1 2012/2013
5: 2013-08-01 5 1 2012/2013
6: 2013-12-01 6 2 2013/2014
7: 2014-05-01 7 2 2013/2014
8: 2014-12-01 8 3 2014/2015
9: 2015-05-01 9 3 2014/2015
10: 2015-08-01 10 3 2014/2015
11: 2015-12-01 11 4 2015/2016
12: 2016-05-01 12 4 2015/2016
13: 2016-08-01 13 4 2015/2016
14: 2016-12-01 14 5 2016/2017
Edit 2
Decided to put all the code together. This should get you the final result you're looking for.
library(data.table)
library(lubridate)
df = data.frame(
date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191,
16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"),
n= 1:14
)
setDT(df)
df$AcademicYear = cumsum(month(df$date) == 12)
df[ , "AcademicYear2" := ifelse(any(month(date) == 5), paste(year(date[month(date) == 5]) - 1,year(date[month(date) == 5]), sep = "/"),
ifelse(any(month(date) == 8), paste(year(date[month(date) == 8]) - 1,year(date[month(date) == 8]), sep = "/"),
paste(year(date[month(date) == 12]),year(date[month(date) == 12]) + 1, sep = "/"))), by = .(AcademicYear)]
df = df[ , .(Sum = sum(n)), by = .(AcademicYear = AcademicYear2)]
> df
AcademicYear Sum
1: 2011/2012 3
2: 2012/2013 12
3: 2013/2014 13
4: 2014/2015 27
5: 2015/2016 36
6: 2016/2017 14
Upvotes: 1
Reputation: 492
Not sure what terms you want with what dates but you could use dplyr and mutate with a series of if else statements. Its slow but it works.
df <- data.frame(
date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191,
16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"),
n= 1:14
)
df <- mutate(df, term=ifelse(date >= as.Date("2012-05-01") & date <= as.Date("2012-08-01"), "1",
ifelse(date >= as.Date("2012-12-01") & date <= as.Date("2013-05-01"), "2",
ifelse(date >= as.Date("2013-12-01") & date <= as.Date("2014-12-01"), "3",
ifelse(date >= as.Date("2015-08-01") & date <= as.Date("2016-08-01"), "4",
"other")))))
Upvotes: 0