Reputation: 655
I currently have two sample datasets: score
and status
. In the real dataset, there are more than two people.
Data
score
Person Score ScoreDateTime
Person A 18 12/6/2020 5:06
Person A 18 12/6/2020 8:00
Person A 20 12/7/2020 7:40
Person B 18 12/27/2020 1:56
Person B 19 12/27/2020 20:00
Person B 18 12/28/2020 20:00
Person B 20 12/29/2020 20:00
status
Person Status StatusDateTime
Person A Woke up 12/7/2020 2:00
Person A Woke up 12/7/2020 4:00
Person A Woke up 12/7/2020 6:00
Person A Woke up 12/7/2020 8:00
Person A Woke up 12/7/2020 10:00
Person B Woke up 12/27/2020 5:00
Person B Woke up 12/28/2020 2:00
Person B Woke up 12/28/2020 4:00
Person B Woke up 12/28/2020 22:00
Person B Woke up 12/28/2020 23:59
I want to filter status
to only keep rows that are in between the ScoreDateTimes
where the Score
is less than 19.
The time range should be from when the Score
is less than 19 to up until when the Score
is 19 or greater. So:
Desired status
dataset
Person Status StatusDateTime
Person A Woke up 12/7/2020 2:00
Person A Woke up 12/7/2020 4:00
Person A Woke up 12/7/2020 6:00
Person B Woke up 12/27/2020 5:00
Person B Woke up 12/28/2020 22:00
Person B Woke up 12/28/2020 23:59
Any help would be greatly appreciated (and a dplyr approach would be awesome).
Code
score <- structure(list(Person = c("Person A", "Person A", "Person A",
"Person B", "Person B", "Person B", "Person B"), Score = c(18L,
18L, 20L, 18L, 19L, 18L, 20L), ScoreDateTime = c("12/6/2020 5:06",
"12/6/2020 8:00", "12/7/2020 7:40", "12/27/2020 1:56", "12/27/2020 20:00",
"12/28/2020 20:00", "12/29/2020 20:00")), class = "data.frame", row.names = c(NA,
-7L))
status <- structure(list(Person = c("Person A", "Person A", "Person A",
"Person A", "Person A", "Person B", "Person B", "Person B", "Person B",
"Person B", "Person B", "Person B"), Status = c("Woke up", "Woke up",
"Woke up", "Woke up", "Woke up", "Woke up", "Woke up", "Woke up",
"Woke up", "Woke up", "Woke up", "Woke up"), StatusDateTime = c("12/7/2020 2:00",
"12/7/2020 4:00", "12/7/2020 6:00", "12/7/2020 8:00", "12/7/2020 10:00",
"12/27/2020 5:00", "12/28/2020 2:00", "12/28/2020 4:00", "12/28/2020 22:00",
"12/28/2020 23:59", "12/29/2020 2:00", "12/29/2020 22:00")), class = "data.frame", row.names = c(NA,
-12L))
Upvotes: 0
Views: 71
Reputation: 27732
here is a data.table
approach
library(date.table)
# Sample data -------------------
score <- fread("Person Score ScoreDateTime
PersonA 18 12/6/2020T5:06
PersonA 18 12/6/2020T8:00
PersonA 20 12/7/2020T7:40
PersonB 18 12/27/2020T1:56
PersonB 19 12/27/2020T20:00
PersonB 18 12/28/2020T20:00
PersonB 20 12/29/2020T20:00")
status <- fread("Person Status StatusDateTime
PersonA Wokeup 12/7/2020T2:00
PersonA Wokeup 12/7/2020T4:00
PersonA Wokeup 12/7/2020T6:00
PersonA Wokeup 12/7/2020T8:00
PersonA Wokeup 12/7/2020T10:00
PersonB Wokeup 12/27/2020T5:00
PersonB Wokeup 12/28/2020T2:00
PersonB Wokeup 12/28/2020T4:00
PersonB Wokeup 12/28/2020T22:00
PersonB Wokeup 12/28/2020T23:59")
score[, ScoreDateTime := as.POSIXct(ScoreDateTime, format = "%m/%d/%YT%H:%M")]
status[, StatusDateTime := as.POSIXct(StatusDateTime, format = "%m/%d/%YT%H:%M")]
# Code ---------------------
score[, ScoreDateTime2 := shift(ScoreDateTime, n = 1, type = "lead") - 1, by = Person]
status[ score[ Score <= 18, ], match := 1, on = .(Person, StatusDateTime >= ScoreDateTime, StatusDateTime <= ScoreDateTime2)]
status[ match == 1,][, match := NULL][]
# Person Status StatusDateTime
# 1: PersonA Wokeup 2020-12-07 02:00:00
# 2: PersonA Wokeup 2020-12-07 04:00:00
# 3: PersonA Wokeup 2020-12-07 06:00:00
# 4: PersonB Wokeup 2020-12-27 05:00:00
# 5: PersonB Wokeup 2020-12-28 22:00:00
# 6: PersonB Wokeup 2020-12-28 23:59:00
Upvotes: 1