haltux
haltux

Reputation: 61

Match intervals with events in R

I have an event table and a set of intervals. I want to associate a boolean with each event, that is set to T is the event is in the interval.

My inputs:

intervals:
begin                      end
2015-03-01 14:02:00        2015-03-01 14:09:00
2015-03-01 15:13:00        2015-03-01 15:54:00
2015-03-01 16:02:00        2015-03-01 19:09:00

events:
id           date
1            2015-03-01 14:01:00
2            2015-03-01 14:03:00
3            2015-03-01 14:07:00
4            2015-03-01 15:55:00
5            2015-03-01 17:02:00

My output would be:

id           inInterval
1            F
2            T
3            T
4            F
5            T

I have millions of events and tens of thousands of intervals. For performance reasons I don't want to loop on the events and I don't want to process all events for each interval.

What I can easily do is get on ordered data.table of events and intervals start/end:

events:
id           date                     start       end
1            2015-03-01 14:01:00      
NA           2015-03-01 14:02:00      T
2            2015-03-01 14:03:00
3            2015-03-01 14:07:00
NA           2015-03-01 14:09:00                  T
NA           2015-03-01 15:13:00      T
NA           2015-03-01 15:54:00                  T
4            2015-03-01 15:55:00
NA           2015-03-01 16:02:00      T
5            2015-03-01 17:02:00
NA           2015-03-01 19:09:00                  T

Now I am stuck on the last part: setting inInterval=T for all lines between a start=T and a end=T. It clearly requires some kind of lag, but I am missing the required feature.

Any help would be appreciated. Thanks

EDIT: It seems that this existing question provides some insights: What is an efficient method for partitioning and aggregating intervals from timestamped rows in a data frame? It was suggested once I published my question. Basically findInterval should help me solve my issue.

Upvotes: 2

Views: 278

Answers (2)

Davis Vaughan
Davis Vaughan

Reputation: 2950

With ivs and iv_between():

library(ivs)
library(dplyr, warn.conflicts = FALSE)

intervals <- tibble(
  begin = c(
    "2015-03-01 14:02:00",
    "2015-03-01 15:13:00",
    "2015-03-01 16:02:00"
  ),
  end = c(
    "2015-03-01 14:09:00",
    "2015-03-01 15:54:00",
    "2015-03-01 19:09:00"
  )
)

intervals <- intervals %>%
  mutate(
    begin = as.POSIXct(begin, tz = "UTC"),
    end = as.POSIXct(end, tz = "UTC")
  ) %>%
  mutate(
    range = iv(begin, end),
    .keep = "unused"
  )

events <- tibble(
  id = 1:5,
  date = c(
    "2015-03-01 14:01:00",
    "2015-03-01 14:03:00",
    "2015-03-01 14:07:00",
    "2015-03-01 15:55:00",
    "2015-03-01 17:02:00"
  )
)

events <- events %>%
  mutate(date = as.POSIXct(date, tz = "UTC"))

intervals
#> # A tibble: 3 × 1
#>                                        range
#>                                   <iv<dttm>>
#> 1 [2015-03-01 14:02:00, 2015-03-01 14:09:00)
#> 2 [2015-03-01 15:13:00, 2015-03-01 15:54:00)
#> 3 [2015-03-01 16:02:00, 2015-03-01 19:09:00)

events %>%
  mutate(in_interval = iv_between(date, intervals$range))
#> # A tibble: 5 × 3
#>      id date                in_interval
#>   <int> <dttm>              <lgl>      
#> 1     1 2015-03-01 14:01:00 FALSE      
#> 2     2 2015-03-01 14:03:00 TRUE       
#> 3     3 2015-03-01 14:07:00 TRUE       
#> 4     4 2015-03-01 15:55:00 FALSE      
#> 5     5 2015-03-01 17:02:00 TRUE

Upvotes: 0

Tensibai
Tensibai

Reputation: 15784

My newbie idea in base R (given the intervals table is ordered and that there's no overlapping intervals):

pint <- as.POSIXct(as.vector(t(intervals)))
results <- data.frame(
            id = events$id,
            inInterval = findInterval( as.POSIXct(events$date), pint) %% 2 != 0
         )

Gives:

  id inInterval
1  1      FALSE
2  2       TRUE
3  3       TRUE
4  4      FALSE
5  5       TRUE

findInterval return the index in the intervals found:

  • if it's odd, we're in a real interval return TRUE
  • if it's even we're outside real intervals. return FALSE

Upvotes: 1

Related Questions