erc
erc

Reputation: 10123

Create groups based on time period

How can I create a new grouping variable for my data based on 5-year steps?

So from this:

group <- c(rep("A", 7), rep("B", 10))
year <- c(2008:2014, 2005:2014)
dat <- data.frame(group, year)

   group year
1      A 2008
2      A 2009
3      A 2010
4      A 2011
5      A 2012
6      A 2013
7      A 2014
8      B 2005
9      B 2006
10     B 2007
11     B 2008
12     B 2009
13     B 2010
14     B 2011
15     B 2012
16     B 2013
17     B 2014

To this:

 > dat
   group year    period
1      A 2008 2005_2009
2      A 2009 2005_2009
3      A 2010 2010_2014
4      A 2011 2010_2014
5      A 2012 2010_2014
6      A 2013 2010_2014
7      A 2014 2010_2014
8      B 2005 2005_2009
9      B 2006 2005_2009
10     B 2007 2005_2009
11     B 2008 2005_2009
12     B 2009 2005_2009
13     B 2010 2010_2014
14     B 2011 2010_2014
15     B 2012 2010_2014
16     B 2013 2010_2014
17     B 2014 2010_2014

I guess I could use cut(dat$year, breaks = ??) but I don't know how to set the breaks.

Upvotes: 3

Views: 116

Answers (2)

Therkel
Therkel

Reputation: 1438

Here is one way of doing it:

dat$period <- paste(min <- floor(dat$year/5)*5, min+4,sep = "_")

I guess the trick here is to get the biggest whole number smaller than your year with the floor(year/x)*x function.


Here is a version that should work generally:

x <- 5
yearstart <- 2000
dat$period <- paste(min <- floor((dat$year-yearstart)/x)*x+yearstart,
                    min+x-1,sep = "_")

You can use yearstart to ensure e.g. year 2000 is the first in a group for when x is not a multiple of it.

Upvotes: 4

fdetsch
fdetsch

Reputation: 5308

cut should do the job if you create actual Date objects from your 'year' column.

## convert 'year' column to dates
yrs <- paste0(dat$year, "-01-01")
yrs <- as.Date(yrs)

## create cuts of 5 years and add them to data.frame
dat$period <- cut(yrs, "5 years")

## create desired factor levels
library(lubridate)

lvl <- as.Date(levels(dat$period))
lvl <- paste(year(lvl), year(lvl) + 4, sep = "_")
levels(dat$period) <- lvl

head(dat)
  group year    period
1     A 2008 2005_2009
2     A 2009 2005_2009
3     A 2010 2010_2014
4     A 2011 2010_2014
5     A 2012 2010_2014
6     A 2013 2010_2014

Upvotes: 1

Related Questions