Jon Romero
Jon Romero

Reputation: 4090

Merging date ranges in R

I have this dataframe (called signal):

         Date Sig
1  2012-03-25  Go
2  2012-04-15 Stop
3  2012-04-22 Stop
4  2012-05-13 Stop
5  2012-05-20 Stop
6  2012-06-24  Go
7  2012-09-23  Go
8  2012-09-30  Go
9  2012-10-14 Stop
10 2012-12-02  Go
11 2012-12-16 Stop

And I am trying to merge/join the date ranges in order to create something like that:

        Start        Stop Sig
1  2012-03-25  2012-04-15 Go
2  2012-04-15  2012-06-24 Stop
3  2012-06-24  2012-10-14 Go
4  2012-10-14  2012-12-02 Stop
5  2012-12-12  2012-12-16 Go

Any ideas, please?

Upvotes: 3

Views: 356

Answers (2)

Uwe
Uwe

Reputation: 42544

This old question hasn't got a proper answer so far. Here is a concise data.table solution using the rleid() function:

library(data.table)
setDT(signal)[order(Date), .(Start = first(Date)), by = .(rleid(Sig), Sig)][
  , Stop := shift(Start, type = "lead")][
    -.N, !"rleid"]
    Sig      Start       Stop
1:   Go 2012-03-25 2012-04-15
2: Stop 2012-04-15 2012-06-24
3:   Go 2012-06-24 2012-10-14
4: Stop 2012-10-14 2012-12-02
5:   Go 2012-12-02 2012-12-16

Explanation

setDT()coerces signal to class data.table. Then signal is ordered by Date and aggregated by continuous streaks of Sig using rleid() and by the value of Sig. The first row of each group is picked. To determine the stop dates the new Start column is shifted forward. Finally, the last row and the rleid grouping variable are removed.

Data

OP's data:

library(data.table)
signal <- fread(
  " i          Date Sig
1  2012-03-25  Go
2  2012-04-15 Stop
3  2012-04-22 Stop
4  2012-05-13 Stop
5  2012-05-20 Stop
6  2012-06-24  Go
7  2012-09-23  Go
8  2012-09-30  Go
9  2012-10-14 Stop
10 2012-12-02  Go
11 2012-12-16 Stop", drop = 1L, data.table = FALSE
)

Upvotes: 1

Mouad_Seridi
Mouad_Seridi

Reputation: 2716

The way I would go about it is to sort the segments and then collapse the ones that have the same value and are back to back.

require(data.table)

## generating a (similar ?) data set
df <- data.frame(dates = rep(as.Date('01-01-2010','%m-%d-%Y'),20) + sample(1:100,20), 
             sig = sample(c('stop', 'go'), replace = T, ))
             
df$sig <- as.character(df$sig)                 
             
df <- df[order(df$dates),]

### creating the lag variable for date 
df$dates2 <- c(NA,df$dates[1:nrow(df)-1])


### creating the lag variable for sig

df$sig2 <- c(NA,df$sig[1:nrow(df)-1])

## creating a variable that triggers a new segment 
df$grp <- as.numeric(df$sig != df$sig2)
df$grp[1] <- 0

### the cumsum of the trigger is actually the grouping variable 

df$grp2 <- cumsum(df$grp)


## using data table 
 dt <- data.table(df)


 dt2 <- dt[,.(start = min(dates), end = max(dates), sig = sig ), 
       grp]

the results

 grp      start        end  sig
 1:   0 2010-01-05 2010-04-11   go
 2:   0 2010-01-05 2010-04-11   go
 3:   0 2010-01-05 2010-04-11   go
 4:   0 2010-01-05 2010-04-11 stop
 5:   0 2010-01-05 2010-04-11 stop
 6:   0 2010-01-05 2010-04-11   go
 7:   0 2010-01-05 2010-04-11 stop
 8:   0 2010-01-05 2010-04-11   go

Upvotes: 1

Related Questions