Pam
Pam

Reputation: 111

Expand range of dates by another column by inserting rows in R

Consider a dataframe of the form.

idnum var start       end        between
17    A   1993-03-01    1993-03-12  1993-03-01
17    B   1993-01-02    1993-04-09  1993-04-03
20    A   1993-02-01    1993-02-01  1993-02-01
21    C   1993-05-09    1993-07-12  1993-07-10

with start, end and between of type date.

I would like to split the ranges based on the 'between' column so that my final data frame would look like this.

idnum var start          end        between
17    A   1993-03-01    1993-03-01  1993-03-01
17    A   1993-03-02    1993-03-12  
17    B   1993-01-02    1993-04-02  
17    B   1993-04-03    1993-04-03  1993-04-03
17    B   1993-04-05    1993-04-09  
20    A   1993-02-01    1993-02-01  1993-02-01
21    C   1993-05-09    1993-07-09  
21    C   1993-07-10    1993-07-10  1993-07-10
21    C   1993-07-10    1993-07-12  

Note that a combination of idnum and var determine unique records.For each between date, another row should have the start and end dates split at 'between'. The other start and end date should be adjusted to previous and later dates of between dates.

I am struggling to get this right in data.table.

The code to generate this data is-

idnum   <- c(17, 17, 20, 21)
var     <- c("A", "B", "A", "C")
start   <- c("1993-03-01","1993-01-02","1993-02-01","1993-05-09")
end     <- c("1993-03-12","1993-04-09","1993-02-01","1993-07-12")
between <- c("1993-03-01","1993-04-03","1993-02-01","1993-07-10"
start   <- as.Date(start)
end     <- as.Date(end)
between <- as.Date(between)

df      <- data.frame(idnum, var, start, end, between) 

Upvotes: 2

Views: 88

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 174478

Here's a very pedestrian way of doing it:

do.call(rbind, lapply(split(df, seq_along(df$idnum)), function(x) { 
                            if(x$between[1] == x$end[1]) return(x)
                            x <-  x[c(1, 1),]
                            x$end[1] <- x$between[1]
                            x$start[2] <- x$between[1] + 1
                            x$between[2] <- NA
                            x}))

#>       idnum var      start        end    between
#> 1.1      17   A 1993-03-01 1993-03-01 1993-03-01
#> 1.1.1    17   A 1993-03-02 1993-03-12       <NA>
#> 2.2      17   B 1993-01-02 1993-04-03 1993-04-03
#> 2.2.1    17   B 1993-04-04 1993-04-09       <NA>
#> 3        20   A 1993-02-01 1993-02-01 1993-02-01
#> 4.4      21   C 1993-05-09 1993-07-10 1993-07-10
#> 4.4.1    21   C 1993-07-11 1993-07-12       <NA>

Created on 2020-07-26 by the reprex package (v0.3.0)

Upvotes: 2

Related Questions