Reputation: 519
I need to sum the number of days across multiple intervals in the same observations. I have seen many different examples in StackOverflow about this task. Still, I cannot reproduce them using my data, because I have to do it in more than two overlapping times, and across multiple intervals.
library(lubridate)
library(dplyr)
a <- c(as_date(0), as_date(8), as_date(80),as_date(60))
b <-c(as_date(2), as_date(20), as_date(100),as_date(80))
c <-c(as_date(1), as_date(16), as_date(95),as_date(85))
d <- c(as_date(100), as_date(19), as_date(120),as_date(100))
e <-c(as_date(0), as_date(50), as_date(101),as_date(65))
f <- c(as_date(150), as_date(100), as_date(200),as_date(200))
df <- data.frame(int.1 = interval(a, b), int.2 = interval(c, d), int.3 = interval(e, f))
I can sum the total time between the intervals, but the time that overlaps is included:
df %>%
mutate(overlapping.time = int.1 %/% days(1) + int.2 %/% days(1) + int.3 %/% days(1))
int.1 int.2 int.3 overlapping.time
1 1970-01-01 UTC--1970-01-03 UTC 1970-01-02 UTC--1970-04-11 UTC 1970-01-01 UTC--1970-05-31 UTC 251
2 1970-01-09 UTC--1970-01-21 UTC 1970-01-17 UTC--1970-01-20 UTC 1970-02-20 UTC--1970-04-11 UTC 65
3 1970-03-22 UTC--1970-04-11 UTC 1970-04-06 UTC--1970-05-01 UTC 1970-04-12 UTC--1970-07-20 UTC 144
4 1970-03-02 UTC--1970-03-22 UTC 1970-03-27 UTC--1970-04-11 UTC 1970-03-07 UTC--1970-07-20 UTC 170
Upvotes: 4
Views: 617
Reputation: 1438
Below is a function overlapping_days()
, which will take a set of interval columns and calculate the total amount of overlapping days. See inline comments for how it works. It covers intervals completely contained within another, partially overlapping, and makes no assumptions about the relationships between columns. Subtracting the result of the function from your previous calculation will get you what you want. Note that I modified the data I used a bit from what you originally posted.
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:lubridate':
#>
#> intersect, setdiff, union
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
a <- c(as_date(0), as_date(1), as_date(80),as_date(60))
b <-c(as_date(20), as_date(22), as_date(100),as_date(80))
c <-c(as_date(1), as_date(16), as_date(95),as_date(85))
d <- c(as_date(3), as_date(19), as_date(120),as_date(100))
e <-c(as_date(0), as_date(50), as_date(101),as_date(65))
f <- c(as_date(150), as_date(100), as_date(200),as_date(200))
df <- data.frame(int.1 = interval(a, b), int.2 = interval(c, d), int.3 = interval(e, f))
overlapping_days <- function(...) {
# Collect the vectors passed into a list
ll <- list(...)
# Create all possible 2-combinations for the number of columns passed in.
combinations <- combn(length(ll), 2)
# Create a column for each combination, and a row for each element in the vectors.
overlaps <- matrix(data = 0, nrow = length(ll[[1]]), ncol = length(combinations))
# Loop through the combinations
iterations <- seq_len(ncol(combinations))
for (k in iterations) {
# I'll refer to each of these indices as intervals -- they each represent
# a vector passed in.
i <- combinations[1, k]
j <- combinations[2, k]
overlaps[,k] <- case_when(
# If the interval i is within interval j, add i to the overlap
ll[[i]] %within% ll[[j]] ~ ll[[i]] %/% days(1),
# If the interval j is within interval i, add j to the overlap
ll[[j]] %within% ll[[i]] ~ ll[[j]] %/% days(1),
# If they overlap, either int_start(i) < int_end(j), or int_start(j) < int_end(i)
# Calculate the appropriate difference -- these look backwards but
# are needed so a positive number is produced.
int_overlaps(ll[[i]], ll[[j]]) & int_start(ll[[j]]) < int_end(ll[[i]]) ~
int_start(ll[[j]]) %--% int_end(ll[[i]]) %/% days(1),
int_overlaps(ll[[j]], ll[[i]]) & int_start(ll[[i]]) < int_end(ll[[j]]) ~
int_start(ll[[i]]) %--% int_end(ll[[j]]) %/% days(1),
# If none of these are true, the intervals do not overlap and we add 0 to
# the overlap amount.
TRUE ~ 0
)
}
# Sum across rows to get the total number of overlapping days.
rowSums(overlaps)
}
df %>%
mutate(overlapping.time = int.1 %/% days(1) + int.2 %/% days(1) + int.3 %/% days(1), overlap = overlapping_days(int.1, int.2, int.3))
#> Note: method with signature 'Timespan#Timespan' chosen for function '%/%',
#> target signature 'Interval#Period'.
#> "Interval#ANY", "ANY#Period" would also be valid
#> int.1 int.2
#> 1 1970-01-01 UTC--1970-01-21 UTC 1970-01-02 UTC--1970-01-04 UTC
#> 2 1970-01-02 UTC--1970-01-23 UTC 1970-01-17 UTC--1970-01-20 UTC
#> 3 1970-03-22 UTC--1970-04-11 UTC 1970-04-06 UTC--1970-05-01 UTC
#> 4 1970-03-02 UTC--1970-03-22 UTC 1970-03-27 UTC--1970-04-11 UTC
#> int.3 overlapping.time overlap
#> 1 1970-01-01 UTC--1970-05-31 UTC 172 24
#> 2 1970-02-20 UTC--1970-04-11 UTC 74 3
#> 3 1970-04-12 UTC--1970-07-20 UTC 144 24
#> 4 1970-03-07 UTC--1970-07-20 UTC 170 30
Upvotes: 3