Reputation: 2720
Suppose we start with a data frame that looks like the below:
ID STATE_1 STATE_2
1 1 NULL NULL
2 1 FRY NULL
3 1 NULL CRY
4 1 FRY CRY
5 1 NULL NULL
6 1 NULL NULL
7 1 FRY CRY
8 1 NULL NULL
9 5 NULL NULL
10 5 NULL CRY
11 5 FRY NULL
12 5 NULL NULL
Generated as follows for those who would like to replicate:
ID <- c(1, 1, 1, 1, 1, 1, 1, 1, 5, 5, 5, 5)
STATE_1 <- c("NULL", "FRY", "NULL", "FRY", "NULL", "NULL", "FRY", "NULL", "NULL", "NULL", "FRY", "NULL")
STATE_2 <- c("NULL", "NULL", "CRY", "CRY", "NULL", "NULL", "CRY", "NULL", "NULL", "CRY", "NULL", "NULL")
df <- data.frame(ID, STATE_1, STATE_2)
How would one add a FLAG column, whereby for each ID, once one of first of multiple states is triggered by an element that is not NULL, then all subsequent rows for that ID are flagged with that first non-NULL status? So for example, with the FLAG column added, the output would look like this:
ID STATE_1 STATE_2 FLAG [Explanations for FLAG column elements]
1 1 NULL NULL NULL No flag tripped for ID 1 (yet) so value is NULL
2 1 FRY NULL FRY FRY tripped in STATE_1 so FRY applies for all subsequent rows for this ID
3 1 NULL CRY FRY Ignore CRY since FRY happened first
4 1 FRY CRY FRY
5 1 NULL NULL FRY
6 1 NULL NULL FRY
7 1 FRY CRY FRY
8 1 NULL NULL FRY
9 5 NULL NULL NULL No flag tripped for ID 5 (yet) so value is NULL
10 5 NULL CRY CRY CRY tripped in STATE_2 so CRY applies for all subsequent rows for this ID
11 5 FRY NULL CRY Ignore FRY since CRY happened first
12 5 NULL NULL CRY But please read the following paragraph, because another possible scenario is omitted for sake of brevity
If in row 10 above, for ID 5, there had both appeared FRY and CRY in that same row, then FRY would have governed for all subsequent ID 5 rows. I didn't include this scenario in my example above. So there is a priority where STATE_1 trumps STATE_2, etc.
In my actual data there are 6 states, compared with the 2 states (STATE_1 and STATE_2) in this example.
I have a strong preference for working with dplyr()
.
As I researched this, there are several posts addressing flagging data based on a numeric trigger, but I couldn´t find any solutions using character string triggers like in my data. I am hoping there is a character string solution so I can avoid the added steps of converting from character strings to numeric variables.
Upvotes: 2
Views: 673
Reputation: 2816
Yes, this can be done with dplyr
and character strings. Here's one way:
df %>%
mutate(across(.cols = c("STATE_1", "STATE_2"), ~ na_if(., "NULL"))) %>%
group_by(ID) %>%
fill(STATE_1, STATE_2, .direction = "down") %>%
mutate(flag = if_else(is.na(lag(coalesce(STATE_1, STATE_2))),
coalesce(STATE_1, STATE_2),
NA_character_)) %>%
fill(flag, .direction = "down")
ID STATE_1 STATE_2 flag
1 1 <NA> <NA> <NA>
2 1 FRY <NA> FRY
3 1 FRY CRY FRY
4 1 FRY CRY FRY
5 1 FRY CRY FRY
6 1 FRY CRY FRY
7 1 FRY CRY FRY
8 1 FRY CRY FRY
9 5 <NA> <NA> <NA>
10 5 <NA> CRY CRY
11 5 FRY CRY CRY
12 5 FRY CRY CRY
Step by step, here's what this solution does:
NA
.)ID
so that we operate within each ID separately.fill()
to use each non-null value in STATE_1
and STATE_2
to populate all the non-null values below it, until we reach the next non-null value. (You can also fill "up" instead of down; down is the default direction, but I've included it explicitly for the sake of clarity.)flag
field. If both states in the immediately preceding row have null values (as identified by lag()
), then we want to trigger a new flag; use coalesce()
to prioritize STATE_1
over STATE_2
. Otherwise, we don't want a new flag; populate with NA
for now.fill()
again to populate flags downward: each new flag populates the rows below it until we get to a new flag.This procedure also works for the scenario you described where row 10 has "FRY" for STATE_1
:
df2 = df
df2$STATE_1[10] = "FRY"
df2 %>%
mutate(across(.cols = -c("ID"), ~ na_if(., "NULL"))) %>%
group_by(ID) %>%
fill(STATE_1, STATE_2, .direction = "down") %>%
mutate(flag = if_else(is.na(lag(coalesce(STATE_1, STATE_2))),
coalesce(STATE_1, STATE_2),
NA_character_)) %>%
fill(flag, .direction = "down")
ID STATE_1 STATE_2 flag
1 1 <NA> <NA> <NA>
2 1 FRY <NA> FRY
3 1 FRY CRY FRY
4 1 FRY CRY FRY
5 1 FRY CRY FRY
6 1 FRY CRY FRY
7 1 FRY CRY FRY
8 1 FRY CRY FRY
9 5 <NA> <NA> <NA>
10 5 FRY CRY FRY
11 5 FRY CRY FRY
12 5 FRY CRY FRY
Upvotes: 3