Reputation: 27
I have a data-stream with time, ID, two event types (A and B), and (currently) blank co-occurrence columns. I want to go through the dataset, and for every B event, check if there was an A within the previous 5 seconds. If so, that A event line would receive the ID from the B event in its co-occurrence column. In the rare event that there are multiple, the second co-occurrence adds to a second column (or both could go into the same column to be dealt with later).
I can achieve most of the desired result using a loop and some logic, but there are times where there are multiple Bs that occur within 5 seconds of an A, or multiple As that happen within 5s before a B, so using current line -1 doesn't capture these.
An example data-stream looks like this:
Time ID Event Co1 Co2
7:47:28 X1 A
7:47:30 X2 B
7:48:02 X3 A
7:48:04 X4 A
7:48:05 X5 B
7:50:11 X1 A
7:50:12 X2 B
7:50:15 X5 B
7:55:50 X6 A
7:55:52 X2 B
And with correct processing should yield this:
Time ID Event Co1 Co2
7:47:28 X1 A X2
7:47:30 X2 B
7:48:02 X3 A X5
7:48:04 X4 A X5
7:48:05 X5 B
7:50:11 X1 A X2 X5
7:50:12 X2 B
7:50:15 X5 B
7:55:50 X6 A X2
7:55:52 X2 B
Any help or pointers in the right direction would be much appreciated!
Upvotes: 0
Views: 46
Reputation: 6969
Here is solution with foverlaps
function from data.table
package:
library(data.table)
dt <- read.table(text = "Time ID Event
07:47:28 X1 A
07:47:30 X2 B
07:48:02 X3 A
07:48:04 X4 A
07:48:05 X5 B
07:50:11 X6 A
07:50:12 X7 B
07:50:15 X8 B
07:55:50 X9 A
07:55:52 X10 B", header = TRUE, sep = " ", stringsAsFactors = FALSE)
# Use data.table
setDT(dt)
# Join dataset to self over the 5 second lookback period
dt[, time := as.ITime(Time)]
dt[, time.lookback := time - as.ITime("00:00:05")]
setkey(dt, time.lookback, time)
dt.join <- foverlaps(dt, dt)
dt.join <- dt.join[order(ID)]
# You should be able to simplify this part a lot:
dt.join <- dt.join[(Event == i.Event & time == i.time) | (Event == "A" & i.Event == "B" & time < i.time)]
setorder(dt.join, ID, Event, -i.Event, i.time)
dt.join[i.Event == "A", i.ID := NA]
dt.join[i.Event == "A", i.Event := NA]
dt.join[i.Event == "B" & time == i.time, i.ID := NA]
dt.join[i.Event == "B" & time == i.time, i.Event := NA]
dt.join[, rn := cumsum(i.Event == "B"), .(ID, Event)]
# Now brining the dataset back to original granularity:
res <- dcast(
dt.join,
formula = ID + Event ~ paste0("col", rn),
value.var = "i.ID"
)
res$colNA <- NULL
res
# ID Event col1 col2
# 1: X1 A X2 <NA>
# 2: X10 B <NA> <NA>
# 3: X2 B <NA> <NA>
# 4: X3 A X5 <NA>
# 5: X4 A X5 <NA>
# 6: X5 B <NA> <NA>
# 7: X6 A X7 X8
# 8: X7 B <NA> <NA>
# 9: X8 B <NA> <NA>
# 10: X9 A X10 <NA>
Upvotes: 1
Reputation: 7818
Given your input:
df <- read.table(text = "Time ID Event
7:47:28 X1 A
7:47:30 X2 B
7:48:02 X3 A
7:48:04 X4 A
7:48:05 X5 B
7:50:11 X1 A
7:50:12 X2 B
7:50:15 X5 B
7:55:50 X6 A
7:55:52 X2 B", header = TRUE)
# convert to HMS
df$Time <- lubridate::hms(df$Time)
You can use slide_index_dfr
to capture the ID
s of B
5 seconds ahead and set it up into a dataframe. You can then change the names and add it back to your df
.
xx <- slider::slide_index_dfr(df, df$Time, ~if(.$Event[1] == "A") .$ID[.$Event == "B"] else character(), .after = 5)
colnames(xx) <- paste0("Col", seq_len(ncol(xx)))
cbind(df, xx)
#> Time ID Event Col1 Col2
#> 1 7H 47M 28S X1 A X2 <NA>
#> 2 7H 47M 30S X2 B <NA> <NA>
#> 3 7H 48M 2S X3 A X5 <NA>
#> 4 7H 48M 4S X4 A X5 <NA>
#> 5 7H 48M 5S X5 B <NA> <NA>
#> 6 7H 50M 11S X1 A X2 X5
#> 7 7H 50M 12S X2 B <NA> <NA>
#> 8 7H 50M 15S X5 B <NA> <NA>
#> 9 7H 55M 50S X6 A X2 <NA>
#> 10 7H 55M 52S X2 B <NA> <NA>
Upvotes: 1