Reputation: 512
I am hoping to find a data.table
+ lubridate
solution to the following problem.
Suppose that I have the following dataset:
library(data.table)
library(lubridate)
library(magrittr)
sample <- data.table(start = c("2018-12-22 23:00:00",
"2018-12-23 06:00:00",
"2018-12-22 06:00:00",
"2018-12-23 06:00:00"),
end = c("2018-12-23 06:00:00",
"2018-12-23 13:00:00",
"2018-12-23 12:00:00",
"2018-12-24 01:00:00"),
store = c("A", "A", "B", "B"),
var = 1:4)
sample[, start:= ymd_hms(start)]
sample[, end := ymd_hms(end)]
The output looks like:
> sample
start end store var
1: 2018-12-22 23:00:00 2018-12-23 06:00:00 A 1
2: 2018-12-23 06:00:00 2018-12-23 13:00:00 A 2
3: 2018-12-22 06:00:00 2018-12-23 12:00:00 B 3
4: 2018-12-23 06:00:00 2018-12-24 01:00:00 B 4
Notice that on rows 1, 3, and 4, we have observations that span multiple dates. I would like to break up these observations so that they only take up one calendar date each. It is also possible that a start date and end date spans multiple days, but I would like to have one observation for each date. For the sample above, the data.table
should look like:
start end store var
1: 2018-12-22 23:00:00 2018-12-22 23:59:59 A 1
2: 2018-12-23 00:00:00 2018-12-23 06:00:00 A 1
3: 2018-12-23 06:00:00 2018-12-23 13:00:00 A 2
4: 2018-12-22 06:00:00 2018-12-22 23:59:59 B 3
5: 2018-12-23 00:00:00 2018-12-23 12:00:00 B 3
6: 2018-12-23 06:00:00 2018-12-23 23:59:59 B 4
7: 2018-12-24 00:00:00 2018-12-24 01:00:00 B 4
It is important the var
variable is the same when we break up the observation across multiple dates.
Thanks!
Upvotes: 2
Views: 97
Reputation: 160407
Using a simple helper-function,
library(lubridate)
func <- function(st, en) {
days <- seq(floor_date(min(st), unit = "days"),
ceiling_date(max(en), unit = "days"),
by = "1 day")
days <- c(st, days[-c(1, length(days))], en)
list(days[-length(days)], days[-1])
}
We get:
library(data.table)
sample[, setNames(func(start, end), c("start", "end")), by = .(store, var)]
# store var start end
# <char> <int> <POSc> <POSc>
# 1: A 1 2018-12-22 23:00:00 2018-12-23 00:00:00
# 2: A 1 2018-12-23 00:00:00 2018-12-23 06:00:00
# 3: A 2 2018-12-23 06:00:00 2018-12-23 13:00:00
# 4: B 3 2018-12-22 06:00:00 2018-12-23 00:00:00
# 5: B 3 2018-12-23 00:00:00 2018-12-23 12:00:00
# 6: B 4 2018-12-23 06:00:00 2018-12-24 00:00:00
# 7: B 4 2018-12-24 00:00:00 2018-12-24 01:00:00
Upvotes: 3
Reputation: 24722
# expand the rows
sample = sample[sample[, .(date = seq(as.IDate(start),as.IDate(end),1)), by=var], on="var"]
# fix the times
sample[, `:=`(
start = fifelse(as.IDate(start) == date,
ymd_hms(paste0(as.Date(start),as.ITime(start))),
ymd_hms(paste0(date,"00:00:00"))),
end = fifelse(as.IDate(end) == date,
ymd_hms(paste0(as.Date(end),as.ITime(end))),
ymd_hms(paste0(date, "23:59:59"))),
date = NULL
)]
Output:
start end store var
1: 2018-12-22 23:00:00 2018-12-22 23:59:59 A 1
2: 2018-12-23 00:00:00 2018-12-23 06:00:00 A 1
3: 2018-12-23 06:00:00 2018-12-23 13:00:00 A 2
4: 2018-12-22 06:00:00 2018-12-22 23:59:59 B 3
5: 2018-12-23 00:00:00 2018-12-23 12:00:00 B 3
6: 2018-12-23 06:00:00 2018-12-23 23:59:59 B 4
7: 2018-12-24 00:00:00 2018-12-24 01:00:00 B 4
Upvotes: 1