How can I filter for most recent occurrence within a time window?

Question

I have a data-stream with time, ID, two event types (A and B), and (currently) blank co-occurrence columns. I want to go through the dataset, and for every B event, check if there was an A within the previous 5 seconds. If so, that A event line would receive the ID from the B event in its co-occurrence column. In the rare event that there are multiple, the second co-occurrence adds to a second column (or both could go into the same column to be dealt with later).

I can achieve most of the desired result using a loop and some logic, but there are times where there are multiple Bs that occur within 5 seconds of an A, or multiple As that happen within 5s before a B, so using current line -1 doesn't capture these.

An example data-stream looks like this:

Time     ID  Event Co1 Co2
7:47:28  X1  A
7:47:30  X2  B
7:48:02  X3  A
7:48:04  X4  A
7:48:05  X5  B
7:50:11  X1  A
7:50:12  X2  B
7:50:15  X5  B
7:55:50  X6  A
7:55:52  X2  B

And with correct processing should yield this:

Time     ID  Event Co1 Co2
7:47:28  X1  A     X2
7:47:30  X2  B
7:48:02  X3  A     X5
7:48:04  X4  A     X5
7:48:05  X5  B
7:50:11  X1  A     X2  X5
7:50:12  X2  B
7:50:15  X5  B
7:55:50  X6  A     X2
7:55:52  X2  B

Any help or pointers in the right direction would be much appreciated!

Edo · Accepted Answer

Given your input:

df <- read.table(text = "Time     ID  Event
7:47:28  X1  A
7:47:30  X2  B
7:48:02  X3  A
7:48:04  X4  A
7:48:05  X5  B
7:50:11  X1  A
7:50:12  X2  B
7:50:15  X5  B
7:55:50  X6  A
7:55:52  X2  B", header = TRUE)

# convert to HMS
df$Time <- lubridate::hms(df$Time)

You can use slide_index_dfr to capture the IDs of B 5 seconds ahead and set it up into a dataframe. You can then change the names and add it back to your df.

xx <- slider::slide_index_dfr(df, df$Time, ~if(.$Event[1] == "A") .$ID[.$Event == "B"] else character(), .after = 5)
colnames(xx) <- paste0("Col", seq_len(ncol(xx)))
cbind(df, xx)
#>          Time ID Event Col1 Col2
#> 1  7H 47M 28S X1     A   X2 
#> 2  7H 47M 30S X2     B  
#> 3   7H 48M 2S X3     A   X5 
#> 4   7H 48M 4S X4     A   X5 
#> 5   7H 48M 5S X5     B  
#> 6  7H 50M 11S X1     A   X2   X5
#> 7  7H 50M 12S X2     B  
#> 8  7H 50M 15S X5     B  
#> 9  7H 55M 50S X6     A   X2 
#> 10 7H 55M 52S X2     B

How can I filter for most recent occurrence within a time window?

Answers (2)

Related Questions