Reputation: 4398
Objective:
I have a dataset, df, that I would like to group by the ID and find the duration based on certain conditions: Focus == True, Read == True, and ID != "". However, I do not want to aggregate the IDs, as I wish to have them in their own separate 'chunks'
ID Date Focus Read
A 1/2/2020 5:00:00 AM True True
A 1/2/2020 5:00:05 AM True True
1/3/2020 6:00:00 AM True
1/3/2020 6:00:05 AM True
B 1/4/2020 7:00:00 AM True True
B 1/4/2020 7:00:02 AM True True
B 1/4/2020 7:00:10 AM True True
A 1/2/2020 7:30:00 AM True True
A 1/2/2020 7:30:20 AM True True
I would like this output:
ID Duration Date
A 5 sec 1/2/2020
B 10 sec 1/4/2020
A 20 sec 1/2/2020
dput:
structure(list(ID = structure(c(2L, 2L, 1L, 1L, 3L, 3L, 3L, 2L,
2L), .Label = c("", "A", "B"), class = "factor"), Date = structure(c(1L,
2L, 5L, 6L, 7L, 8L, 9L, 3L, 4L), .Label = c("1/2/2020 5:00:00 AM",
"1/2/2020 5:00:05 AM", "1/2/2020 7:30:00 AM", "1/2/2020 7:30:20 AM",
"1/3/2020 6:00:00 AM", "1/3/2020 6:00:05 AM", "1/4/2020 7:00:00 AM",
"1/4/2020 7:00:02 AM", "1/4/2020 7:00:10 AM"), class = "factor"),
Focus = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "True ", class = "factor"),
Read = structure(c(2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("",
"True "), class = "factor")), class = "data.frame", row.names = c(NA,
-9L))
This works well, but instead of aggregating the IDs, how would I keep them separate:
library(dplyr)
library(lubridate)
df %>%
filter(as.logical(trimws(Read)), as.logical(trimws(Focus))) %>%
mutate(Date = mdy_hms(Date)) %>%
group_by(ID) %>%
summarise(Duration = difftime(last(Date), first(Date), units = "secs"))
Any suggestion is appreciated.
Upvotes: 1
Views: 35
Reputation: 887118
We could create the group with run-length-encoding-id rleid
for adjacent non-equal elements in 'ID', and then apply the difftime
on the 'Date' after conversion to DateTime
library(dplyr)
library(lubridate)
library(data.table)
df %>%
filter(as.logical(trimws(Read)), as.logical(trimws(Focus))) %>%
mutate(Date = mdy_hms(Date)) %>%
group_by(grp = rleid(ID), ID) %>%
summarise(Duration = difftime(last(Date), first(Date), units = "secs"),
Date = as.Date(first(Date))) %>%
ungroup %>%
select(-grp)
# A tibble: 3 x 3
# ID Duration Date
# <fct> <drtn> <date>
#1 A 5 secs 2020-01-02
#2 B 10 secs 2020-01-04
#3 A 20 secs 2020-01-02
Upvotes: 1