Michael Simonsen
Michael Simonsen

Reputation: 57

in R: How to create continuous date intervals from timestamps

I have this data frame with records of people coming to or leaving the country. 'date' and 'inout' indicates whether a person has traveled in ('I') or out ('O') of the country on what date. Id = 1 travels out of the country on 2008-10-06 and returns again on 2009-04-30.

Data:

df <- data.frame( id=c(1,1,2,2,2,2,3), 
date=c('2008-10-06','2009-04-30', '1999-07-25','2004-02-27','2005-06-09','2013-07-01','2010-09-07'), 
inout = c('O','I','I','O','I','O','I'))

  id       date inout
1  1 2008-10-06     O
2  1 2009-04-30     I
3  2 1999-07-25     I
4  2 2004-02-27     O
5  2 2005-06-09     I
6  2 2013-07-01     O
7  3 2010-09-07     I

I need a new data frame like this:

   id      start        end destination
1   1 1900-01-01 2008-10-06        home
2   1 2008-10-06 2009-04-30      abroad
3   1 2009-04-30 2017-08-01        home
4   2 1900-01-01 1999-07-25      abroad
5   2 1999-07-25 2004-02-27        home
6   2 2004-02-27 2005-06-09      abroad
7   2 2005-06-09 2013-07-01        home
8   2 2013-07-01 2017-08-01      abroad
9   3 1900-01-01 2010-09-07      abroad
10  3 2010-09-07 2017-08-01        home

Where each person's first stay starts with a default date 1900-01-01 and the last stay ends with the current date (2017-08-01). In this data frame, Id = 1 is home from 1900-01-01 to 2008-10-06 and abroad from 2008-10-06 to 2009-04-30 and home again from 2009-04-30 until 2017-08-01.

Can anyone help me with that. Preferably using the dplyr package if necessary. Best regards

Upvotes: 2

Views: 448

Answers (2)

d.b
d.b

Reputation: 32548

Base R. Pretty messy. Seems to work.

do.call(rbind, lapply(split(df, df$id), function(a) {
    cbind(id = rep(a$id, length.out = NROW(a)+1),
          setNames(object = data.frame(do.call(
        rbind, lapply(1:(NROW(a) + 1), function(i)
            c("1970-01-01", as.character(a$date), "2017-08-01")[i:(i + 1)])
    )),
    nm = c("Start", "End")),
    Destination = if (a$inout[1] == "O") {
        rep(x = c("home", "abroad"),
            length.out = NROW(a) + 1)
    } else{
        rep(x = c("abroad", "home"),
            length.out = NROW(a) + 1)
    })
}))
#    id      Start        End Destination
#1.1  1 1970-01-01 2008-10-06        home
#1.2  1 2008-10-06 2009-04-30      abroad
#1.3  1 2009-04-30 2017-08-01        home
#2.1  2 1970-01-01 1999-07-25      abroad
#2.2  2 1999-07-25 2004-02-27        home
#2.3  2 2004-02-27 2005-06-09      abroad
#2.4  2 2005-06-09 2013-07-01        home
#2.5  2 2013-07-01 2017-08-01      abroad
#3.1  3 1970-01-01 2010-09-07      abroad
#3.2  3 2010-09-07 2017-08-01        home

Upvotes: 0

Oriol Mirosa
Oriol Mirosa

Reputation: 2826

Here's my solution. It assumes that df has two consecutive rows for each trip, and therefore there is an even number of rows in the dataset (it will not work otherwise). In order to test this, then, I added a new row to df, which as pasted above only had 7 rows and therefore would have the fourth trip incomplete:

library(dplyr)
library(tidyr)
library(lubridate)

df %>%
  mutate(trips = rep(seq(1, n() / 2), each = 2)) %>%
  group_by(trips) %>%
  spread(inout, date) %>%
  mutate(start = if_else(date(I) < date(O), I, O),
         end = if_else(date(I) < date(O), O, I),
         destination = if_else(date(I) < date(O), 'home', 'abroad')) %>%
  ungroup %>%
  select(-c(trips, I, O))

##      id      start        end destination
##   <chr>     <fctr>     <fctr>       <chr>
## 1     1 2008-10-06 2009-04-30      abroad
## 2     2 1999-07-25 2004-02-27        home
## 3     2 2005-06-09 2013-07-01        home
## 4     3 2010-09-07 2012-03-08        home

Upvotes: 2

Related Questions