SoccerAnalytics26
SoccerAnalytics26

Reputation: 171

Variable change over time based on event in R

I have some time lapsing data where I want to change specific variables based on an event. With help I was able to figure out how to change the value once, but I want to be able to change it constantly. Below is the data that I start out with.

Time Event1 Event2  ID_1 ID_2 ID_3 ID_4 ID_5
1    0      0       1    1    1    1    0
2    0      0       1    1    1    1    0
3    0      0       1    1    1    1    0
4    0      0       1    1    1    1    0
5    ID_2   ID_5    1    1    1    1    0
6    0      0       1    1    1    1    0
7    ID_5   ID_2    1    1    1    1    0
8    0      0       1    1    1    1    0

What I want to happen, is when an ID_X shows up in Event1, I want to change all future values in that variable to 0 UNTIL they show up again in Event2, in which case I want to flip them back to a 1. Below is what I am looking to do.

UPDATED DATA TABLE

Time Event1 Event2  ID_1 ID_2 ID_3 ID_4 ID_5
1    0      0       1    1    1    1    0
2    0      0       1    1    1    1    0
3    0      0       1    1    1    1    0
4    0      0       1    1    1    1    0
5    ID_2   ID_5    1    0    1    1    1
6    0      0       1    0    1    1    1
7    ID_5   ID_2    1    1    1    1    0
8    0      0       1    1    1    1    0

As you can see, ID_2 shows up in Event1, starts to get 0s, then shows up in Event2 and should be flipped back to 1. The only thing I was able to figure out with some help was how to flip it from 1/0 for the rest of the data. However, I don't think what I did (using melt and reshaping the data) will work when I need to change things back and forth. Any help would be great. Thanks.

Upvotes: 0

Views: 608

Answers (1)

Uwe
Uwe

Reputation: 42564

This can be solved using a combination of cumsum() to record appearances of "ID_2", e.g., in Event1 or Event2, resp., and xor() to flip.

library(data.table)
DT[, ID_2 := 0L + xor(cumsum(Event1 == "ID_2") - cumsum(Event2 == "ID_2"), ID_2)][]
DT[, ID_5 := 0L + xor(cumsum(Event1 == "ID_5") - cumsum(Event2 == "ID_5"), ID_5)][]
    Time Event1 Event2 ID_1 ID_2 ID_3 ID_4 ID_5
 1:    1      0      0    1    1    1    1    0
 2:    2      0      0    1    1    1    1    0
 3:    3      0      0    1    1    1    1    0
 4:    4      0      0    1    1    1    1    0
 5:    5   ID_2   ID_5    1    0    1    1    1
 6:    6      0      0    1    0    1    1    1
 7:    7   ID_5   ID_2    1    1    1    1    0
 8:    8      0      0    1    1    1    1    0
 9:    9   ID_2      0    1    0    1    1    0
10:   10      0      0    1    0    1    1    0
11:   11      0   ID_2    1    1    1    1    0
12:   12      0      0    1    1    1    1    0

Note that additional rows have been added to the sample data set in order to demonstrate that ID_2 flips a second time.

If there are many ID_x columns, this can be repeated for each column conveniently using lapply()

id_cols <- stringr::str_subset(names(DT), "^ID")
invisible(lapply(id_cols, function(col) 
  DT[, (col) := 0L + xor(cumsum(Event1 == col) - cumsum(Event2 == col), get(col))]))
DT[]

The result is the same as above.

data.table is used because it allows to update single columns in place, i.e., without copying the whole data set which saves memory and time. invisible() prevents the result of lapply() to be printed. The result of lapply() is not needed here as DT is updated in place.

Prerequisites

The code is working for the given sample. It relies on the assumption that each ID_x appears alternating in Event1 and Event2 and is likely to fail otherwise.

EDIT: Check for prerequisites

The version below includes a check which halts execution if above prerequisite is violated (not the best self-explanatory error message, though):

id_cols <- stringr::str_subset(names(DT), "^ID")
invisible(lapply(id_cols, function(col) 
  DT[, (col) := {
    tmp <- cumsum(Event1 == col) - cumsum(Event2 == col)
    stopifnot(all(tmp %in% -1:1))
    0L + xor(tmp, get(col))
  }]))
DT[]

Data

As provided by the OP plus four rows added.

library(data.table)
DT <- fread(
  "Time Event1 Event2  ID_1 ID_2 ID_3 ID_4 ID_5
  1    0      0       1    1    1    1    0
  2    0      0       1    1    1    1    0
  3    0      0       1    1    1    1    0
  4    0      0       1    1    1    1    0
  5    ID_2   ID_5    1    1    1    1    0
  6    0      0       1    1    1    1    0
  7    ID_5   ID_2    1    1    1    1    0
  8    0      0       1    1    1    1    0
  9    ID_2   0       1    1    1    1    0
 10    0      0       1    1    1    1    0
 11    0      ID_2    1    1    1    1    0
 12    0      0       1    1    1    1    0"
)

Upvotes: 1

Related Questions