Reputation: 35
I have a data frame that looks like this:
person date1 date2 total amount overlap
A 2019-03-01 2019-03-16 50
A 2019-03-10 2019-03-31 100
A 2019-03-20 2019-03-31 70
B 2019-03-01 2019-03-12 200
B 2019-03-01 2019-03-20 130
B 2019-03-16 2019-03-31 100
I want to create a new column (overlap) that takes the value of each row plus other rows in the same group (here I wanted to group by the person column) that have date ranges that overlaps with the date range of the current row.
So to illustrate, the first row should be 50 (value of the current row) plus 100 (since the second row overlaps with the first one) which gives us a total of 150. Here, notice that we didn't include the third row since the date range of the third row does not overlap with the first row.
I have tried performing group_by(person) then mutate(overlap) but I don't know how to access other rows within the same group to know whether they overlap with the current one. I have also tried looking into the Overlap() function, but I'm not sure how to utilize this to get what I want.
Ideally, I would like to produce a table that look something like this:
person date1 date2 total amount overlap
A 2019-03-01 2019-03-16 50 150
A 2019-03-10 2019-03-31 100 220
A 2019-03-20 2019-03-31 70 170
B 2019-03-01 2019-03-12 200 330
B 2019-03-01 2019-03-20 130 430
B 2019-03-16 2019-03-31 100 230
Upvotes: 3
Views: 264
Reputation: 388817
We can group_by
Person
and do sum
of total_amount
which lies in between
date1
and date2
.
library(dplyr)
df %>%
mutate_at(vars(starts_with("date")), as.Date) %>%
group_by(person) %>%
mutate(overlap = purrr::map2_dbl(date1, date2,
~sum(total_amount[between(date1, .x, .y) | between(date2, .x, .y)])))
# person date1 date2 total_amount overlap
# <fct> <date> <date> <int> <dbl>
#1 A 2019-03-01 2019-03-16 50 150
#2 A 2019-03-10 2019-03-31 100 220
#3 A 2019-03-20 2019-03-31 70 170
#4 B 2019-03-01 2019-03-12 200 330
#5 B 2019-03-01 2019-03-20 130 430
#6 B 2019-03-16 2019-03-31 100 230
data
df <- structure(list(person = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), date1 = structure(c(1L, 2L, 4L, 1L,
1L, 3L), .Label = c("2019-03-01", "2019-03-10", "2019-03-16",
"2019-03-20"), class = "factor"), date2 = structure(c(2L, 4L,
4L, 1L, 3L, 4L), .Label = c("2019-03-12", "2019-03-16", "2019-03-20",
"2019-03-31"), class = "factor"), total_amount = c(50L, 100L,
70L, 200L, 130L, 100L)), class = "data.frame", row.names = c(NA, -6L))
Upvotes: 3