Dambo
Dambo

Reputation: 3486

How to group by academic year?

I have a data frame like this:

  data.frame(
        date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191, 
        16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"), 
        n= 1:14
   )

How can I summarize n by academic year? Each academic year should range from December to the next August. For example, I would like to sum up n over each academic year. Refactoring manually is not an option because of too many values and sometimes even missing ones.

Eventually, the refactoring should look like:

  date         a.y.

"2012-05-01" 2011/2012
"2012-08-01" 2011/2012

"2012-12-01" 2012/2013
"2013-05-01" 2012/2013
"2013-08-01" 2012/2013

"2013-12-01" 2013/2014
"2014-05-01" 2013/2014

"2014-12-01" 2014/2015
"2015-05-01" 2014/2015
"2015-08-01" 2014/2015

"2015-12-01" 2015/2016
"2016-05-01" 2015/2016
"2016-08-01" 2015/2016

"2016-12-01" 2016/2017

As you can notice, the dates follow a similar pattern but each academic year might have a different number of dates.

Upvotes: 0

Views: 113

Answers (2)

Kristofersen
Kristofersen

Reputation: 2806

If I read this right as soon as we see a December entry we change school years. If that is true then the following code will work.

library(data.table)
library(lubridate)
df =   data.frame(
  date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191, 
                    16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"), 
  n= 1:14
)

df$AcademicYear = cumsum(month(df$date) == 12)
setDT(df)
df[ , .(Sum = sum(n)), by = .(AcademicYear)]

  AcademicYear Sum
1:            0   3
2:            1  12
3:            2  13
4:            3  27
5:            4  36
6:            5  14

EDIT

for refactoring you could do something like this. It looks for a month by the AcademicYear, then depending on the month, it knows to add or subtract a year and paste it together. Then, the column just needs to be renamed and summed as above.

df[ , "AcademicYear2" := ifelse(any(month(date) == 5), paste(year(date[month(date) == 5]) - 1,year(date[month(date) == 5]), sep = "/"), 
                               ifelse(any(month(date) == 8), paste(year(date[month(date) == 8]) - 1,year(date[month(date) == 8]), sep = "/"), 
                                      paste(year(date[month(date) == 12]),year(date[month(date) == 12]) + 1, sep = "/"))), by = .(AcademicYear)]

> df
          date  n AcademicYear AcademicYear2
 1: 2012-05-01  1            0     2011/2012
 2: 2012-08-01  2            0     2011/2012
 3: 2012-12-01  3            1     2012/2013
 4: 2013-05-01  4            1     2012/2013
 5: 2013-08-01  5            1     2012/2013
 6: 2013-12-01  6            2     2013/2014
 7: 2014-05-01  7            2     2013/2014
 8: 2014-12-01  8            3     2014/2015
 9: 2015-05-01  9            3     2014/2015
10: 2015-08-01 10            3     2014/2015
11: 2015-12-01 11            4     2015/2016
12: 2016-05-01 12            4     2015/2016
13: 2016-08-01 13            4     2015/2016
14: 2016-12-01 14            5     2016/2017

Edit 2

Decided to put all the code together. This should get you the final result you're looking for.

library(data.table)
library(lubridate)
df =   data.frame(
  date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191, 
                    16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"), 
  n= 1:14
)

setDT(df)
df$AcademicYear = cumsum(month(df$date) == 12)

df[ , "AcademicYear2" := ifelse(any(month(date) == 5), paste(year(date[month(date) == 5]) - 1,year(date[month(date) == 5]), sep = "/"), 
                               ifelse(any(month(date) == 8), paste(year(date[month(date) == 8]) - 1,year(date[month(date) == 8]), sep = "/"), 
                                      paste(year(date[month(date) == 12]),year(date[month(date) == 12]) + 1, sep = "/"))), by = .(AcademicYear)]


df = df[ , .(Sum = sum(n)), by = .(AcademicYear = AcademicYear2)]

> df
   AcademicYear Sum
1:    2011/2012   3
2:    2012/2013  12
3:    2013/2014  13
4:    2014/2015  27
5:    2015/2016  36
6:    2016/2017  14

Upvotes: 1

Bill Perry
Bill Perry

Reputation: 492

Not sure what terms you want with what dates but you could use dplyr and mutate with a series of if else statements. Its slow but it works.

df <- data.frame(
  date= structure(c(15461, 15553, 15675, 15826, 15918, 16040, 16191, 
                    16405, 16556, 16648, 16770, 16922, 17014, 17136), class = "Date"), 
  n= 1:14
)

df <- mutate(df, term=ifelse(date >= as.Date("2012-05-01")  & date <= as.Date("2012-08-01"),  "1",
        ifelse(date >= as.Date("2012-12-01")  & date <= as.Date("2013-05-01"),  "2",
          ifelse(date >= as.Date("2013-12-01")  & date <= as.Date("2014-12-01"),  "3",
         ifelse(date >= as.Date("2015-08-01")  & date <= as.Date("2016-08-01"),  "4",
           "other")))))

Upvotes: 0

Related Questions