AlxRd
AlxRd

Reputation: 285

How to replace values after a specific event in a data frame?

I am trying to replace values in a data frame based on incidence of an event. In column exp_recode I show the desired output. time_point displays time of events which indicate the temporal ordering of events in the events column. I would like to recode to 0 any event after z, the event of interest. Note that there are repeated ids as these are longitudinal data.

If you are wondering why I want to recode / flag the occurences after z, I am planning to remove them down the road as those are not events of interest to me. But I don't want to drop them at this stage of the analysis.

id <- c(rep(1, 6),rep(2,4))
time_point <- c(1:6, 1:4)
event <- c("b","b","c","z", "d", "a", "e", "b", "z", "d")
exp_recode<- c(c("b","b","c","z", 0, 0, "e", "b", "z", 0))
df <- data.frame(id, time_point, event, exp_recode)
df
   id time_point event exp_recode
1   1          1     b          b
2   1          2     b          b
3   1          3     c          c
4   1          4     z          z
5   1          5     d          0
6   1          6     a          0
7   2          1     e          e
8   2          2     b          b
9   2          3     z          z
10  2          4     d          0

Upvotes: 2

Views: 325

Answers (4)

Uwe
Uwe

Reputation: 42544

For the sake of completeness, here is a data.table solution which uses a non-equi join, update on join, and grouping by .EACHI:

library(data.table)   # CRAN version 1.10.4 used
# coerce to data.table class, 
# coerce to character (only required if event is factor)
setDT(df)[, event := as.character(event)][
  # find all z events
  df[event == "z"], 
  # non-equi join, update all events after z event grouped by id
  on = .(id, time_point > time_point), event := "0", by = .EACHI][]
    id time_point event exp_recode
 1:  1          1     b          b
 2:  1          2     b          b
 3:  1          3     c          c
 4:  1          4     z          z
 5:  1          5     0          0
 6:  1          6     0          0
 7:  2          1     e          e
 8:  2          2     b          b
 9:  2          3     z          z
10:  2          4     0          0

Upvotes: 1

akuiper
akuiper

Reputation: 214967

Use base R match to find the first z index and replace everything after it with 0:

zero_after_z <- function(vec) { 
    vec_len = length(vec) 
    first_z = match("z", vec, nomatch = vec_len) 
    if(first_z < vec_len) replace(vec, (first_z+1):vec_len, "0") 
    else vec 
}

zero_after_z(c("a", "b", "z", "d"))
# [1] "a" "b" "z" "0"
df$exp_recode <- with(df, ave(event, id, FUN=zero_after_z))

df
#   id time_point event exp_recode
#1   1          1     b          b
#2   1          2     b          b
#3   1          3     c          c
#4   1          4     z          z
#5   1          5     d          0
#6   1          6     a          0
#7   2          1     e          e
#8   2          2     b          b
#9   2          3     z          z
#10  2          4     d          0

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76450

Try by. (It took me a while because I was trying ave, forgetting that it returns a numeric vector.)

fun <- function(x){
    x <- as.character(x)
    i <- min(which(x == "z"))
    x[seq_along(x)[-seq_len(i)]] <- 0
    x
}
df$exp_recode2 <- unlist(by(df$event, df$id, FUN = fun))
df

I bet that there is a simpler dplyr way of doing this, but this one uses base R only.

Upvotes: 2

CPak
CPak

Reputation: 13581

It's not pretty but it works. NOTE this works only if you have a single "z" in a group.

Your data (stringsAsFactors=F)

df <- data.frame(id, time_point, event, stringsAsFactors=F)

Using dplyr, make exp_recode as 0 when "z" is found and for values after, change exp_recode to "z" when event=="z", and change exp_recode to event when exp_recode==1.

library(dplyr)
df1 <- df %>%
         group_by(id) %>%
         mutate(exp_recode=1-cumsum(event=="z")) %>%
         mutate(exp_recode=ifelse(event=="z", "z", exp_recode)) %>%
         mutate(exp_recode=ifelse(exp_recode==1, event, exp_recode))

Output

      id time_point event exp_recode
 1     1          1     b          b
 2     1          2     b          b
 3     1          3     c          c
 4     1          4     z          z
 5     1          5     d          0
 6     1          6     a          0
 7     2          1     e          e
 8     2          2     b          b
 9     2          3     z          z
 10     2          4     d          0

Upvotes: 1

Related Questions