Reputation: 171
I have some time lapsing data where I want to change specific variables based on an event. With help I was able to figure out how to change the value once, but I want to be able to change it constantly. Below is the data that I start out with.
Time Event1 Event2 ID_1 ID_2 ID_3 ID_4 ID_5
1 0 0 1 1 1 1 0
2 0 0 1 1 1 1 0
3 0 0 1 1 1 1 0
4 0 0 1 1 1 1 0
5 ID_2 ID_5 1 1 1 1 0
6 0 0 1 1 1 1 0
7 ID_5 ID_2 1 1 1 1 0
8 0 0 1 1 1 1 0
What I want to happen, is when an ID_X shows up in Event1, I want to change all future values in that variable to 0 UNTIL they show up again in Event2, in which case I want to flip them back to a 1. Below is what I am looking to do.
UPDATED DATA TABLE
Time Event1 Event2 ID_1 ID_2 ID_3 ID_4 ID_5
1 0 0 1 1 1 1 0
2 0 0 1 1 1 1 0
3 0 0 1 1 1 1 0
4 0 0 1 1 1 1 0
5 ID_2 ID_5 1 0 1 1 1
6 0 0 1 0 1 1 1
7 ID_5 ID_2 1 1 1 1 0
8 0 0 1 1 1 1 0
As you can see, ID_2 shows up in Event1, starts to get 0s, then shows up in Event2 and should be flipped back to 1. The only thing I was able to figure out with some help was how to flip it from 1/0 for the rest of the data. However, I don't think what I did (using melt and reshaping the data) will work when I need to change things back and forth. Any help would be great. Thanks.
Upvotes: 0
Views: 608
Reputation: 42564
This can be solved using a combination of cumsum()
to record appearances of "ID_2"
, e.g., in Event1
or Event2
, resp., and xor()
to flip.
library(data.table)
DT[, ID_2 := 0L + xor(cumsum(Event1 == "ID_2") - cumsum(Event2 == "ID_2"), ID_2)][]
DT[, ID_5 := 0L + xor(cumsum(Event1 == "ID_5") - cumsum(Event2 == "ID_5"), ID_5)][]
Time Event1 Event2 ID_1 ID_2 ID_3 ID_4 ID_5 1: 1 0 0 1 1 1 1 0 2: 2 0 0 1 1 1 1 0 3: 3 0 0 1 1 1 1 0 4: 4 0 0 1 1 1 1 0 5: 5 ID_2 ID_5 1 0 1 1 1 6: 6 0 0 1 0 1 1 1 7: 7 ID_5 ID_2 1 1 1 1 0 8: 8 0 0 1 1 1 1 0 9: 9 ID_2 0 1 0 1 1 0 10: 10 0 0 1 0 1 1 0 11: 11 0 ID_2 1 1 1 1 0 12: 12 0 0 1 1 1 1 0
Note that additional rows have been added to the sample data set in order to demonstrate that ID_2
flips a second time.
If there are many ID_x
columns, this can be repeated for each column conveniently using lapply()
id_cols <- stringr::str_subset(names(DT), "^ID")
invisible(lapply(id_cols, function(col)
DT[, (col) := 0L + xor(cumsum(Event1 == col) - cumsum(Event2 == col), get(col))]))
DT[]
The result is the same as above.
data.table
is used because it allows to update single columns in place, i.e., without copying the whole data set which saves memory and time. invisible()
prevents the result of lapply()
to be printed. The result of lapply()
is not needed here as DT
is updated in place.
The code is working for the given sample. It relies on the assumption that each ID_x
appears alternating in Event1
and Event2
and is likely to fail otherwise.
The version below includes a check which halts execution if above prerequisite is violated (not the best self-explanatory error message, though):
id_cols <- stringr::str_subset(names(DT), "^ID")
invisible(lapply(id_cols, function(col)
DT[, (col) := {
tmp <- cumsum(Event1 == col) - cumsum(Event2 == col)
stopifnot(all(tmp %in% -1:1))
0L + xor(tmp, get(col))
}]))
DT[]
As provided by the OP plus four rows added.
library(data.table)
DT <- fread(
"Time Event1 Event2 ID_1 ID_2 ID_3 ID_4 ID_5
1 0 0 1 1 1 1 0
2 0 0 1 1 1 1 0
3 0 0 1 1 1 1 0
4 0 0 1 1 1 1 0
5 ID_2 ID_5 1 1 1 1 0
6 0 0 1 1 1 1 0
7 ID_5 ID_2 1 1 1 1 0
8 0 0 1 1 1 1 0
9 ID_2 0 1 1 1 1 0
10 0 0 1 1 1 1 0
11 0 ID_2 1 1 1 1 0
12 0 0 1 1 1 1 0"
)
Upvotes: 1