Reputation: 285
I am trying to replace values in a data frame based on incidence of an event. In column exp_recode
I show the desired output. time_point
displays time of events which indicate the temporal ordering of events in the events
column. I would like to recode to 0 any event after z
, the event of interest. Note that there are repeated id
s as these are longitudinal data.
If you are wondering why I want to recode / flag the occurences after z
, I am planning to remove them down the road as those are not events of interest to me. But I don't want to drop them at this stage of the analysis.
id <- c(rep(1, 6),rep(2,4))
time_point <- c(1:6, 1:4)
event <- c("b","b","c","z", "d", "a", "e", "b", "z", "d")
exp_recode<- c(c("b","b","c","z", 0, 0, "e", "b", "z", 0))
df <- data.frame(id, time_point, event, exp_recode)
df
id time_point event exp_recode
1 1 1 b b
2 1 2 b b
3 1 3 c c
4 1 4 z z
5 1 5 d 0
6 1 6 a 0
7 2 1 e e
8 2 2 b b
9 2 3 z z
10 2 4 d 0
Upvotes: 2
Views: 325
Reputation: 42544
For the sake of completeness, here is a data.table
solution which uses a non-equi join, update on join, and grouping by .EACHI
:
library(data.table) # CRAN version 1.10.4 used
# coerce to data.table class,
# coerce to character (only required if event is factor)
setDT(df)[, event := as.character(event)][
# find all z events
df[event == "z"],
# non-equi join, update all events after z event grouped by id
on = .(id, time_point > time_point), event := "0", by = .EACHI][]
id time_point event exp_recode 1: 1 1 b b 2: 1 2 b b 3: 1 3 c c 4: 1 4 z z 5: 1 5 0 0 6: 1 6 0 0 7: 2 1 e e 8: 2 2 b b 9: 2 3 z z 10: 2 4 0 0
Upvotes: 1
Reputation: 214967
Use base R match
to find the first z index and replace everything after it with 0
:
zero_after_z <- function(vec) {
vec_len = length(vec)
first_z = match("z", vec, nomatch = vec_len)
if(first_z < vec_len) replace(vec, (first_z+1):vec_len, "0")
else vec
}
zero_after_z(c("a", "b", "z", "d"))
# [1] "a" "b" "z" "0"
df$exp_recode <- with(df, ave(event, id, FUN=zero_after_z))
df
# id time_point event exp_recode
#1 1 1 b b
#2 1 2 b b
#3 1 3 c c
#4 1 4 z z
#5 1 5 d 0
#6 1 6 a 0
#7 2 1 e e
#8 2 2 b b
#9 2 3 z z
#10 2 4 d 0
Upvotes: 1
Reputation: 76450
Try by
. (It took me a while because I was trying ave
, forgetting that it returns a numeric vector.)
fun <- function(x){
x <- as.character(x)
i <- min(which(x == "z"))
x[seq_along(x)[-seq_len(i)]] <- 0
x
}
df$exp_recode2 <- unlist(by(df$event, df$id, FUN = fun))
df
I bet that there is a simpler dplyr
way of doing this, but this one uses base R
only.
Upvotes: 2
Reputation: 13581
It's not pretty but it works. NOTE this works only if you have a single "z" in a group.
Your data (stringsAsFactors=F
)
df <- data.frame(id, time_point, event, stringsAsFactors=F)
Using dplyr
, make exp_recode
as 0 when "z" is found and for values after, change exp_recode
to "z" when event=="z"
, and change exp_recode
to event
when exp_recode==1
.
library(dplyr)
df1 <- df %>%
group_by(id) %>%
mutate(exp_recode=1-cumsum(event=="z")) %>%
mutate(exp_recode=ifelse(event=="z", "z", exp_recode)) %>%
mutate(exp_recode=ifelse(exp_recode==1, event, exp_recode))
Output
id time_point event exp_recode
1 1 1 b b
2 1 2 b b
3 1 3 c c
4 1 4 z z
5 1 5 d 0
6 1 6 a 0
7 2 1 e e
8 2 2 b b
9 2 3 z z
10 2 4 d 0
Upvotes: 1