Reputation: 741
I have a dataset with a column datetime (start) and datetime_end. After data manipulation I want to break down this interval by minute per row - let's say if I have this interval
datetime datetime_end id disc
2019-03-19 12:47:28 2019-03-19 12:50:37 5-3 start
I would like to break it down by minutes to have something like this :
datetime id disc
2019-03-19 12:48:00 5-3 start
2019-03-19 12:49:00 5-3 start
2019-03-19 12:50:00 5-3 start
2019-03-19 12:51:00 5-3 start
Here is the dummy dataframe
df1 <- data.frame(stringsAsFactors=FALSE,
datetime = c("2019-03-19T13:26:52Z", "2019-03-19T13:26:19Z",
"2019-03-19T13:23:46Z", "2019-03-19T13:22:20Z",
"2019-03-19T13:09:56Z", "2019-03-19T13:06:04Z", "2019-03-19T13:05:21Z",
"2019-03-19T13:04:37Z", "2019-03-19T12:47:28Z",
"2019-03-19T12:46:42Z"),
id = c("5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3",
"5-3"),
disc = c("car", "stop", "start", "stop", "start", "stop", "start",
"stop", "start", "stop")
)
I tried to use lubridate::interval function to make a interval object (travel interval) but I am struggling to break it down by minute per row (as I showed above). So, if someone knows a solution I would really appreciate that.
Here is my script
library(tidyverse)
library(lubridate)
df <- df1 %>%
mutate(datetime = lubridate::as_datetime(datetime)) %>%
arrange(datetime) %>%
mutate(datetime_end = lead(datetime),
# Create an interval object.
Travel_Interval =
lubridate::interval(start = datetime, end = datetime_end)) %>%
filter(!is.na(Travel_Interval)) %>%
# select(-Travel_Interval)
select(datetime,datetime_end , id , disc,Travel_Interval) %>%
filter(disc == "start")
Upvotes: 0
Views: 270
Reputation: 5893
df1 %>%
mutate(datetime = lubridate::as_datetime(datetime)) %>%
arrange(datetime) %>%
mutate(datetime_end = lead(datetime)) %>%
filter(!is.na(datetime_end)) %>%
mutate_at(vars(contains("datetime")), ~ round_date(.x + seconds(30), unit = "minute")) %>%
mutate(diff = time_length(interval(datetime, datetime_end), unit = "minutes")) %>%
mutate(time = map2(datetime, diff, ~ .x + minutes(seq(0, .y)))) %>%
unnest(time)
Just wanted to post it since I was already working on it - despite the good answer already there. This uses lubridate
functions time_length
and interval
to get the sequence.
Upvotes: 1
Reputation: 459
I'd use purrr::map2()
for this:
# take df1 %>% mutate datetime column to datetime format %>% sort by datetime
# %>% add datetime_end as lead of datetime %>% filter out records with no
# recorded datetime_end %>% mutate to create column 'minute' by using
# purrr::map2 to iterate over each datetime and datetime_end pair and apply the
# following function {create an sequence of datestamps starting at the "minute
# ceiling" of 'start'datetime' and ending at the "minute ceiling" of
# 'datetime_end in one minute intervals} %>% since the resultant column is a
# list, we have to unnest the data
df <- df1 %>%
mutate(datetime = as_datetime(datetime)) %>%
arrange(datetime) %>%
mutate(datetime_end = lead(datetime, n = 1L)) %>%
filter(!is.na(datetime_end)) %>%
mutate(minute = purrr::map2(datetime, datetime_end, function(start, stop) {
seq.POSIXt(from = ceiling_date(start, 'minute'), to = ceiling_date(stop, 'minute'), by = 'min')
})) %>%
unnest()
Note, however, that since you are effectively cutting timestamps into minute intervals using some form of rounding (taking the ceiling, in this instance), you're going to have to decide how to deal with boundary cases. E.g.: the first run of disc
== "stop" will have its last row end with minute
== 2019-03-19 12:48:00, but the first row of the subsequent disc
== "start" _run" will have its first row start with minute
== 2019-03-19 12:48:00 as well:
datetime id disc datetime_end minute
1 2019-03-19 12:46:42 5-3 stop 2019-03-19 12:47:28 2019-03-19 12:47:00
2 2019-03-19 12:46:42 5-3 stop 2019-03-19 12:47:28 2019-03-19 12:48:00
3 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:48:00
4 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:49:00
5 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:50:00
6 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:51:00
7 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:52:00
8 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:53:00
9 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:54:00
10 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:55:00
11 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:56:00
12 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:57:00
13 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:58:00
14 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:59:00
15 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:00:00
16 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:01:00
17 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:02:00
18 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:03:00
19 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:04:00
20 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:05:00
21 2019-03-19 13:04:37 5-3 stop 2019-03-19 13:05:21 2019-03-19 13:05:00
22 2019-03-19 13:04:37 5-3 stop 2019-03-19 13:05:21 2019-03-19 13:06:00
Upvotes: 2