in R: How to create continuous date intervals from timestamps

Question

I have this data frame with records of people coming to or leaving the country. 'date' and 'inout' indicates whether a person has traveled in ('I') or out ('O') of the country on what date. Id = 1 travels out of the country on 2008-10-06 and returns again on 2009-04-30.

Data:

df <- data.frame( id=c(1,1,2,2,2,2,3), 
date=c('2008-10-06','2009-04-30', '1999-07-25','2004-02-27','2005-06-09','2013-07-01','2010-09-07'), 
inout = c('O','I','I','O','I','O','I'))

  id       date inout
1  1 2008-10-06     O
2  1 2009-04-30     I
3  2 1999-07-25     I
4  2 2004-02-27     O
5  2 2005-06-09     I
6  2 2013-07-01     O
7  3 2010-09-07     I

I need a new data frame like this:

   id      start        end destination
1   1 1900-01-01 2008-10-06        home
2   1 2008-10-06 2009-04-30      abroad
3   1 2009-04-30 2017-08-01        home
4   2 1900-01-01 1999-07-25      abroad
5   2 1999-07-25 2004-02-27        home
6   2 2004-02-27 2005-06-09      abroad
7   2 2005-06-09 2013-07-01        home
8   2 2013-07-01 2017-08-01      abroad
9   3 1900-01-01 2010-09-07      abroad
10  3 2010-09-07 2017-08-01        home

Where each person's first stay starts with a default date 1900-01-01 and the last stay ends with the current date (2017-08-01). In this data frame, Id = 1 is home from 1900-01-01 to 2008-10-06 and abroad from 2008-10-06 to 2009-04-30 and home again from 2009-04-30 until 2017-08-01.

Can anyone help me with that. Preferably using the dplyr package if necessary. Best regards

d.b · Accepted Answer

Base R. Pretty messy. Seems to work.

do.call(rbind, lapply(split(df, df$id), function(a) {
    cbind(id = rep(a$id, length.out = NROW(a)+1),
          setNames(object = data.frame(do.call(
        rbind, lapply(1:(NROW(a) + 1), function(i)
            c("1970-01-01", as.character(a$date), "2017-08-01")[i:(i + 1)])
    )),
    nm = c("Start", "End")),
    Destination = if (a$inout[1] == "O") {
        rep(x = c("home", "abroad"),
            length.out = NROW(a) + 1)
    } else{
        rep(x = c("abroad", "home"),
            length.out = NROW(a) + 1)
    })
}))
#    id      Start        End Destination
#1.1  1 1970-01-01 2008-10-06        home
#1.2  1 2008-10-06 2009-04-30      abroad
#1.3  1 2009-04-30 2017-08-01        home
#2.1  2 1970-01-01 1999-07-25      abroad
#2.2  2 1999-07-25 2004-02-27        home
#2.3  2 2004-02-27 2005-06-09      abroad
#2.4  2 2005-06-09 2013-07-01        home
#2.5  2 2013-07-01 2017-08-01      abroad
#3.1  3 1970-01-01 2010-09-07      abroad
#3.2  3 2010-09-07 2017-08-01        home

in R: How to create continuous date intervals from timestamps

Answers (2)

Related Questions