johnnyong101
johnnyong101

Reputation: 35

Aggregate column values of rows that have overlapping range of dates

I have a data frame that looks like this:

person    date1       date2     total amount  overlap
  A     2019-03-01  2019-03-16       50               
  A     2019-03-10  2019-03-31      100               
  A     2019-03-20  2019-03-31       70               
  B     2019-03-01  2019-03-12      200               
  B     2019-03-01  2019-03-20      130               
  B     2019-03-16  2019-03-31      100 

I want to create a new column (overlap) that takes the value of each row plus other rows in the same group (here I wanted to group by the person column) that have date ranges that overlaps with the date range of the current row.

So to illustrate, the first row should be 50 (value of the current row) plus 100 (since the second row overlaps with the first one) which gives us a total of 150. Here, notice that we didn't include the third row since the date range of the third row does not overlap with the first row.

I have tried performing group_by(person) then mutate(overlap) but I don't know how to access other rows within the same group to know whether they overlap with the current one. I have also tried looking into the Overlap() function, but I'm not sure how to utilize this to get what I want.

Ideally, I would like to produce a table that look something like this:

person    date1       date2     total amount  overlap 
  A     2019-03-01  2019-03-16       50         150   
  A     2019-03-10  2019-03-31      100         220   
  A     2019-03-20  2019-03-31       70         170   
  B     2019-03-01  2019-03-12      200         330   
  B     2019-03-01  2019-03-20      130         430   
  B     2019-03-16  2019-03-31      100         230   

Upvotes: 3

Views: 264

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388817

We can group_by Person and do sum of total_amount which lies in between date1 and date2.

library(dplyr)

df %>%
  mutate_at(vars(starts_with("date")),  as.Date) %>%
  group_by(person) %>%
  mutate(overlap = purrr::map2_dbl(date1, date2, 
             ~sum(total_amount[between(date1, .x, .y) | between(date2, .x, .y)])))

#  person date1      date2      total_amount overlap
#  <fct>  <date>     <date>            <int>   <dbl>
#1 A      2019-03-01 2019-03-16           50     150
#2 A      2019-03-10 2019-03-31          100     220
#3 A      2019-03-20 2019-03-31           70     170
#4 B      2019-03-01 2019-03-12          200     330
#5 B      2019-03-01 2019-03-20          130     430
#6 B      2019-03-16 2019-03-31          100     230

data

df <- structure(list(person = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), date1 = structure(c(1L, 2L, 4L, 1L, 
1L, 3L), .Label = c("2019-03-01", "2019-03-10", "2019-03-16", 
"2019-03-20"), class = "factor"), date2 = structure(c(2L, 4L, 
4L, 1L, 3L, 4L), .Label = c("2019-03-12", "2019-03-16", "2019-03-20", 
"2019-03-31"), class = "factor"), total_amount = c(50L, 100L, 
70L, 200L, 130L, 100L)), class = "data.frame", row.names = c(NA, -6L))

Upvotes: 3

Related Questions