Reputation: 4090
I have this dataframe (called signal
):
Date Sig
1 2012-03-25 Go
2 2012-04-15 Stop
3 2012-04-22 Stop
4 2012-05-13 Stop
5 2012-05-20 Stop
6 2012-06-24 Go
7 2012-09-23 Go
8 2012-09-30 Go
9 2012-10-14 Stop
10 2012-12-02 Go
11 2012-12-16 Stop
And I am trying to merge/join the date ranges in order to create something like that:
Start Stop Sig
1 2012-03-25 2012-04-15 Go
2 2012-04-15 2012-06-24 Stop
3 2012-06-24 2012-10-14 Go
4 2012-10-14 2012-12-02 Stop
5 2012-12-12 2012-12-16 Go
Any ideas, please?
Upvotes: 3
Views: 356
Reputation: 42544
This old question hasn't got a proper answer so far. Here is a concise data.table
solution using the rleid()
function:
library(data.table)
setDT(signal)[order(Date), .(Start = first(Date)), by = .(rleid(Sig), Sig)][
, Stop := shift(Start, type = "lead")][
-.N, !"rleid"]
Sig Start Stop 1: Go 2012-03-25 2012-04-15 2: Stop 2012-04-15 2012-06-24 3: Go 2012-06-24 2012-10-14 4: Stop 2012-10-14 2012-12-02 5: Go 2012-12-02 2012-12-16
setDT()
coerces signal
to class data.table
. Then signal
is ordered by Date
and aggregated by continuous streaks of Sig
using rleid()
and by the value of Sig
. The first row of each group is picked. To determine the stop dates the new Start
column is shifted forward. Finally, the last row and the rleid
grouping variable are removed.
OP's data:
library(data.table)
signal <- fread(
" i Date Sig
1 2012-03-25 Go
2 2012-04-15 Stop
3 2012-04-22 Stop
4 2012-05-13 Stop
5 2012-05-20 Stop
6 2012-06-24 Go
7 2012-09-23 Go
8 2012-09-30 Go
9 2012-10-14 Stop
10 2012-12-02 Go
11 2012-12-16 Stop", drop = 1L, data.table = FALSE
)
Upvotes: 1
Reputation: 2716
The way I would go about it is to sort the segments and then collapse the ones that have the same value and are back to back.
require(data.table)
## generating a (similar ?) data set
df <- data.frame(dates = rep(as.Date('01-01-2010','%m-%d-%Y'),20) + sample(1:100,20),
sig = sample(c('stop', 'go'), replace = T, ))
df$sig <- as.character(df$sig)
df <- df[order(df$dates),]
### creating the lag variable for date
df$dates2 <- c(NA,df$dates[1:nrow(df)-1])
### creating the lag variable for sig
df$sig2 <- c(NA,df$sig[1:nrow(df)-1])
## creating a variable that triggers a new segment
df$grp <- as.numeric(df$sig != df$sig2)
df$grp[1] <- 0
### the cumsum of the trigger is actually the grouping variable
df$grp2 <- cumsum(df$grp)
## using data table
dt <- data.table(df)
dt2 <- dt[,.(start = min(dates), end = max(dates), sig = sig ),
grp]
grp start end sig
1: 0 2010-01-05 2010-04-11 go
2: 0 2010-01-05 2010-04-11 go
3: 0 2010-01-05 2010-04-11 go
4: 0 2010-01-05 2010-04-11 stop
5: 0 2010-01-05 2010-04-11 stop
6: 0 2010-01-05 2010-04-11 go
7: 0 2010-01-05 2010-04-11 stop
8: 0 2010-01-05 2010-04-11 go
Upvotes: 1