Slav
Slav

Reputation: 489

R divide intervals in hourly slots

probably very easy but struggling with it, looked for the answers on the web but they usually relate to cut and snapshots, not intervals overlapping

require(data.table)
x = data.table(start=c("2017-04-18 18:05:00","2017-04-18 18:00:00", 
"2017-04-18 21:05:00", "2017-04-18 16:05:00"), 
               end=c("2017-04-18 19:05:00","2017-04-18 21:30:00",
"2017-04-18 22:00:00", "2017-04-18 16:10:00"))

we have 4 observations and i need to allocate it to the corresponding hourly windows.

                 start                 end
1: 2017-04-18 18:05:00 2017-04-18 19:05:00
2: 2017-04-18 18:00:00 2017-04-18 21:30:00
3: 2017-04-18 21:05:00 2017-04-18 22:00:00
4: 2017-04-18 16:05:00 2017-04-18 16:10:00

the first one for example will have 55 min in 18:00 slot and 5 min in 19:00 slot, the next one 60 min in 18:00,19:00, 20:00 and 30 min in 21:00, the third one will have 55 min in 21:00 and the last one 5 min in 16:00

the result should be as below (sorry if i got the basic manual additions wrong;)

              interval   Q
1: 2017-04-18 16:00:00 5
2: 2017-04-18 17:00:00 0
3: 2017-04-18 18:00:00 115
4: 2017-04-18 19:00:00 65
5: 2017-04-18 20:00:00 120
6: 2017-04-18 21:00:00  85

of course there is a straight forward way to cut the series by minutes and perform a count by cut/interval, but i believe the problem is so common it must have a direct method. Preferably i would have the 0 value windows as well, but i can just sequence them if required

Upvotes: 3

Views: 1303

Answers (2)

Johan
Johan

Reputation: 828

This is a solution using dplyr

First a helper function find_slots is defined to generate all the hours between start and end. Next the Q values are calculated.

Finally the data is summarized by grouping each slot.

library(dplyr)

find_slots <- function(a, b){
    slots = seq(a-minute(a)*60-second(a),
                b-minute(b)*60-second(b),
                "hour")

    dateseq = slots
    dateseq[1] = a
    r = c(dateseq, b)

    d = as.numeric(difftime(r[-1], r[-length(r)], unit = 'min'))

    data.frame(slot = slots, Q = d)
}

x %>%
    rowwise %>%
    do(find_slots(.$start, .$end)) %>%
    ungroup() %>%
    group_by(slot) %>%
    summarize(Q = sum(Q))

Result (the 0 value for 17:00 is missing) :

                 slot   Q
1 2017-04-18 16:00:00   5
2 2017-04-18 18:00:00 115
3 2017-04-18 19:00:00  65
4 2017-04-18 20:00:00  60
5 2017-04-18 21:00:00  85
6 2017-04-18 22:00:00   0

Edit: Using data.table

(Maybe faster but I'm not too experienced with data.table)

Also using the fasttime library to speedup parsing of the datetimes.

library(fasttime)
library(data.table)

x = data.table(start=c("2017-04-18 18:05:00","2017-04-18 18:00:00", 
"2017-04-18 21:05:00", "2017-04-18 16:05:00"), 
               end=c("2017-04-18 19:05:00","2017-04-18 21:30:00",
"2017-04-18 22:00:00", "2017-04-18 16:10:00"))

find_slots2 <- function(a, b){
    a = fasttime::fastPOSIXct(a)
    b = fasttime::fastPOSIXct(b)
    slots = seq(a-data.table::minute(a)*60-data.table::second(a)*60,
                b-data.table::minute(b)*60-data.table::second(b)*60,
                "hour")

    hourseq = c(a, slots[-1], b)

    d = difftime(hourseq[-1], hourseq[-length(hourseq)], unit = 'min')

    list(slot = slots, Q = d)
}

x[, find_slots2(start, end), by = 1:nrow(x)][order(slot), .(Q = as.numeric(sum(Q))), by = slot]

Upvotes: 3

Nina Sonneborn
Nina Sonneborn

Reputation: 52

Lubridate has a function lubridate::interval() that could be useful here.

Upvotes: 0

Related Questions