Filtering dataset based on conditional time range from another dataset

Question

I currently have two sample datasets: score and status. In the real dataset, there are more than two people.

Data

score

Person      Score ScoreDateTime
Person A    18    12/6/2020 5:06        
Person A    18    12/6/2020 8:00        
Person A    20    12/7/2020 7:40        
Person B    18    12/27/2020 1:56       
Person B    19    12/27/2020 20:00      
Person B    18    12/28/2020 20:00      
Person B    20    12/29/2020 20:00

status

Person      Status   StatusDateTime
Person A    Woke up  12/7/2020 2:00     
Person A    Woke up  12/7/2020 4:00     
Person A    Woke up  12/7/2020 6:00     
Person A    Woke up  12/7/2020 8:00     
Person A    Woke up  12/7/2020 10:00        
Person B    Woke up  12/27/2020 5:00        
Person B    Woke up  12/28/2020 2:00        
Person B    Woke up  12/28/2020 4:00        
Person B    Woke up  12/28/2020 22:00       
Person B    Woke up  12/28/2020 23:59

I want to filter status to only keep rows that are in between the ScoreDateTimes where the Score is less than 19.

The time range should be from when the Score is less than 19 to up until when the Score is 19 or greater. So:

For Person A, I want all rows that are between 12/6/2020 5:06 - 12/7/2020 7:40
For Person B, I want all rows that are between 12/27/2020 1:56 - 12/27/2020 20:00 and 12/28/2020 20:00 - 12/29/2020 20:00

Desired status dataset

Person      Status   StatusDateTime
Person A    Woke up  12/7/2020 2:00     
Person A    Woke up  12/7/2020 4:00     
Person A    Woke up  12/7/2020 6:00  
Person B    Woke up  12/27/2020 5:00   
Person B    Woke up  12/28/2020 22:00       
Person B    Woke up  12/28/2020 23:59

Any help would be greatly appreciated (and a dplyr approach would be awesome).

Code

score <- structure(list(Person = c("Person A", "Person A", "Person A", 
"Person B", "Person B", "Person B", "Person B"), Score = c(18L, 
18L, 20L, 18L, 19L, 18L, 20L), ScoreDateTime = c("12/6/2020 5:06", 
"12/6/2020 8:00", "12/7/2020 7:40", "12/27/2020 1:56", "12/27/2020 20:00", 
"12/28/2020 20:00", "12/29/2020 20:00")), class = "data.frame", row.names = c(NA, 
-7L))

status <- structure(list(Person = c("Person A", "Person A", "Person A", 
"Person A", "Person A", "Person B", "Person B", "Person B", "Person B", 
"Person B", "Person B", "Person B"), Status = c("Woke up", "Woke up", 
"Woke up", "Woke up", "Woke up", "Woke up", "Woke up", "Woke up", 
"Woke up", "Woke up", "Woke up", "Woke up"), StatusDateTime = c("12/7/2020 2:00", 
"12/7/2020 4:00", "12/7/2020 6:00", "12/7/2020 8:00", "12/7/2020 10:00", 
"12/27/2020 5:00", "12/28/2020 2:00", "12/28/2020 4:00", "12/28/2020 22:00", 
"12/28/2020 23:59", "12/29/2020 2:00", "12/29/2020 22:00")), class = "data.frame", row.names = c(NA, 
-12L))

Wimpel · Accepted Answer

here is a data.table approach

library(date.table)
# Sample data -------------------
score <- fread("Person      Score ScoreDateTime
PersonA    18    12/6/2020T5:06        
PersonA    18    12/6/2020T8:00        
PersonA    20    12/7/2020T7:40        
PersonB    18    12/27/2020T1:56       
PersonB    19    12/27/2020T20:00      
PersonB    18    12/28/2020T20:00      
PersonB    20    12/29/2020T20:00")
status <- fread("Person      Status   StatusDateTime
PersonA    Wokeup  12/7/2020T2:00     
PersonA    Wokeup  12/7/2020T4:00     
PersonA    Wokeup  12/7/2020T6:00     
PersonA    Wokeup  12/7/2020T8:00     
PersonA    Wokeup  12/7/2020T10:00        
PersonB    Wokeup  12/27/2020T5:00        
PersonB    Wokeup  12/28/2020T2:00        
PersonB    Wokeup  12/28/2020T4:00        
PersonB    Wokeup  12/28/2020T22:00       
PersonB    Wokeup  12/28/2020T23:59")
score[, ScoreDateTime := as.POSIXct(ScoreDateTime, format = "%m/%d/%YT%H:%M")]
status[, StatusDateTime := as.POSIXct(StatusDateTime, format = "%m/%d/%YT%H:%M")]

# Code ---------------------
score[, ScoreDateTime2 := shift(ScoreDateTime, n = 1, type = "lead") - 1, by = Person]
status[ score[ Score <= 18, ], match := 1, on = .(Person, StatusDateTime >= ScoreDateTime, StatusDateTime <= ScoreDateTime2)]
status[ match == 1,][, match := NULL][]
#     Person Status      StatusDateTime
# 1: PersonA Wokeup 2020-12-07 02:00:00
# 2: PersonA Wokeup 2020-12-07 04:00:00
# 3: PersonA Wokeup 2020-12-07 06:00:00
# 4: PersonB Wokeup 2020-12-27 05:00:00
# 5: PersonB Wokeup 2020-12-28 22:00:00
# 6: PersonB Wokeup 2020-12-28 23:59:00

Filtering dataset based on conditional time range from another dataset

Answers (1)

Related Questions