Reputation: 489
probably very easy but struggling with it, looked for the answers on the web but they usually relate to cut
and snapshots, not intervals overlapping
require(data.table)
x = data.table(start=c("2017-04-18 18:05:00","2017-04-18 18:00:00",
"2017-04-18 21:05:00", "2017-04-18 16:05:00"),
end=c("2017-04-18 19:05:00","2017-04-18 21:30:00",
"2017-04-18 22:00:00", "2017-04-18 16:10:00"))
we have 4 observations and i need to allocate it to the corresponding hourly windows.
start end
1: 2017-04-18 18:05:00 2017-04-18 19:05:00
2: 2017-04-18 18:00:00 2017-04-18 21:30:00
3: 2017-04-18 21:05:00 2017-04-18 22:00:00
4: 2017-04-18 16:05:00 2017-04-18 16:10:00
the first one for example will have 55 min in 18:00 slot and 5 min in 19:00 slot, the next one 60 min in 18:00,19:00, 20:00 and 30 min in 21:00, the third one will have 55 min in 21:00 and the last one 5 min in 16:00
the result should be as below (sorry if i got the basic manual additions wrong;)
interval Q
1: 2017-04-18 16:00:00 5
2: 2017-04-18 17:00:00 0
3: 2017-04-18 18:00:00 115
4: 2017-04-18 19:00:00 65
5: 2017-04-18 20:00:00 120
6: 2017-04-18 21:00:00 85
of course there is a straight forward way to cut the series by minutes and perform a count by cut/interval, but i believe the problem is so common it must have a direct method. Preferably i would have the 0 value windows as well, but i can just sequence them if required
Upvotes: 3
Views: 1303
Reputation: 828
This is a solution using dplyr
First a helper function find_slots
is defined to generate all the hours between start
and end
. Next the Q
values are calculated.
Finally the data is summarized by grouping each slot.
library(dplyr)
find_slots <- function(a, b){
slots = seq(a-minute(a)*60-second(a),
b-minute(b)*60-second(b),
"hour")
dateseq = slots
dateseq[1] = a
r = c(dateseq, b)
d = as.numeric(difftime(r[-1], r[-length(r)], unit = 'min'))
data.frame(slot = slots, Q = d)
}
x %>%
rowwise %>%
do(find_slots(.$start, .$end)) %>%
ungroup() %>%
group_by(slot) %>%
summarize(Q = sum(Q))
Result (the 0 value for 17:00 is missing) :
slot Q
1 2017-04-18 16:00:00 5
2 2017-04-18 18:00:00 115
3 2017-04-18 19:00:00 65
4 2017-04-18 20:00:00 60
5 2017-04-18 21:00:00 85
6 2017-04-18 22:00:00 0
Edit: Using data.table
(Maybe faster but I'm not too experienced with data.table)
Also using the fasttime
library to speedup parsing of the datetimes.
library(fasttime)
library(data.table)
x = data.table(start=c("2017-04-18 18:05:00","2017-04-18 18:00:00",
"2017-04-18 21:05:00", "2017-04-18 16:05:00"),
end=c("2017-04-18 19:05:00","2017-04-18 21:30:00",
"2017-04-18 22:00:00", "2017-04-18 16:10:00"))
find_slots2 <- function(a, b){
a = fasttime::fastPOSIXct(a)
b = fasttime::fastPOSIXct(b)
slots = seq(a-data.table::minute(a)*60-data.table::second(a)*60,
b-data.table::minute(b)*60-data.table::second(b)*60,
"hour")
hourseq = c(a, slots[-1], b)
d = difftime(hourseq[-1], hourseq[-length(hourseq)], unit = 'min')
list(slot = slots, Q = d)
}
x[, find_slots2(start, end), by = 1:nrow(x)][order(slot), .(Q = as.numeric(sum(Q))), by = slot]
Upvotes: 3
Reputation: 52
Lubridate has a function lubridate::interval()
that could be useful here.
Upvotes: 0