user13317
user13317

Reputation: 505

setdiff in lubridate

Trying to figure out why this doesn't work. What I'm looking to do is break up test_int_1 into 3 segements -> before test_init_2, test_init_2, after test_init_2.

library(lubridate)

test_int_1 <- interval(ymd_hms('2024-04-29 17:01:00'), ymd_hms('2024-04-29 18:00:00'))
test_int_2 <- interval(ymd_hms('2024-04-29 17:28:00'), ymd_hms('2024-04-29 17:56:00'))
test_int_2 %within% test_int_1

setdiff(test_int_1, test_int_2)
# Error in setdiff.Interval(test_int_1, test_int_2) : 
#  Cases 1 result in discontinuous intervals.

Not sure if I'm understanding setdiff correctly.

Upvotes: 1

Views: 69

Answers (2)

Davis Vaughan
Davis Vaughan

Reputation: 2960

The sort-and-diff approach does work quite nicely for this particular problem.

You also might consider using the ivs package, which has a whole collection of tools for working with intervals. Notably, iv_set_difference(), which could be useful here if your real problem is more complicated with a larger set of intervals.

library(ivs)
library(lubridate)

test_int_1 <- iv(
  ymd_hms('2024-04-29 17:01:00'),
  ymd_hms('2024-04-29 18:00:00')
)
test_int_2 <- iv(
  ymd_hms('2024-04-29 17:28:00'),
  ymd_hms('2024-04-29 17:56:00')
)

# Removes 2nd interval from 1st interval set
iv_set_difference(test_int_1, test_int_2)
#> <iv<datetime<UTC>>[2]>
#> [1] [2024-04-29 17:01:00, 2024-04-29 17:28:00)
#> [2] [2024-04-29 17:56:00, 2024-04-29 18:00:00)

# In this particular case, sort and diff works too
x <- sort(
  c(
    iv_start(test_int_1),
    iv_end(test_int_1),
    iv_start(test_int_2),
    iv_end(test_int_2)
  )
)
iv_diff(x)
#> <iv<datetime<UTC>>[3]>
#> [1] [2024-04-29 17:01:00, 2024-04-29 17:28:00)
#> [2] [2024-04-29 17:28:00, 2024-04-29 17:56:00)
#> [3] [2024-04-29 17:56:00, 2024-04-29 18:00:00)

Created on 2024-11-28 with reprex v2.1.1

Upvotes: 0

Andrew Gustar
Andrew Gustar

Reputation: 18435

One way to do this using lubridate functions is as follows...

int_diff(sort(c(int_start(c(test_int_1, test_int_2)),
                int_end(c(test_int_1, test_int_2)))))

[1] 2024-04-29 17:01:00 UTC--2024-04-29 17:28:00 UTC
[2] 2024-04-29 17:28:00 UTC--2024-04-29 17:56:00 UTC
[3] 2024-04-29 17:56:00 UTC--2024-04-29 18:00:00 UTC

This simply creates a sorted vector of the end points of the intervals, and uses these to create the three intervals you are looking for.

Upvotes: 1

Related Questions