Onno van der Horst
Onno van der Horst

Reputation: 73

Difference between timestamps per day, with a night break in R

I am trying to calculate the difference between two timestamps and aggregate this per date and uid. I have calculated the difference for each time interval, but would like to split these intervals per day (so create some kind of midnight break). Given that the differences between time intervals go overnight, I am having trouble calculating the time between each interval per day, and thus these breaks.

Here's a snippet of my data:

df <- structure(list(
start_timestamp = c("2013-03-27 01:21:23", "2013-03-28 07:11:58", "2013-03-28 09:09:56", "2013-03-29 00:19:32", "2013-03-29 02:22:53"), 
uid = c(0, 0, 0, 0, 0),
prev_start_timestamp = c("2013-03-27 01:13:26", "2013-03-27 05:58:53", "2013-03-28 08:41:41", "2013-03-28 10:47:01", "2013-03-29 02:17:26")), row.names = c("1", "2", "3", "4", 
"5"), class = "data.frame")

Usually I would work from this output and use dplyr or data.table to aggregate per day. but now, these time differences are simply calculated by substracting the time stamps. While, I want to have this differences split per day.

output, undesired

Expected output would be someting like this: But this one does not make any overnight breaks to separate the time each day.. this output indicates that some days have time intervals of longer than 24 hours, which is not possible of course..

enter image description here

Upvotes: 0

Views: 229

Answers (1)

chinsoon12
chinsoon12

Reputation: 25225

Here is an option using data.table::foverlaps:

#create a data.table of daily intervals
datetimes <- DT[, seq(trunc(min(start), "days"), trunc(max(end)+24*60*60), "days")]
days <- data.table(start=datetimes[-length(datetimes)], end=datetimes[-1L], key=cols)

#set keys on original dataset and perform overlaps before calculating usage per day
setkeyv(DT, cols)
foverlaps(DT, days)[,
    .(phone_usage=sum(pmin(i.end, end) - pmax(i.start, start))), 
    .(uid, date=start)]

output (will leave to OP to check manually if this is correct for the sample dataset):

   uid       date phone_usage
1:   0 2013-03-27  65344 secs
2:   0 2013-03-28  75192 secs
3:   0 2013-03-29   1499 secs

data (note that i have shortened OP's column names):

library(data.table)
DT <- data.table(
    end = c("2013-03-27 01:21:23", "2013-03-28 07:11:58", "2013-03-28 09:09:56", "2013-03-29 00:19:32", "2013-03-29 02:22:53"), 
    uid = c(0, 0, 0, 0, 0),
    start = c("2013-03-27 01:13:26", "2013-03-27 05:58:53", "2013-03-28 08:41:41", "2013-03-28 10:47:01", "2013-03-29 02:17:26"))
cols <- c('start', 'end')
DT[, (cols) := lapply(.SD, as.POSIXct, format="%Y-%m-%d %T"), .SDcols=cols]

Upvotes: 2

Related Questions