Reputation: 83
I have two dataframes one with unique id and the other with multiple ids. I want to create another column in the first dataframe if the value in the second dataframe on or prior to first dataframe date has a value of 1, and if the id is missing in the second dataframe i want to assign NA. What would be the most efficient way to do it in R?
# first dataframe with each unique id
set.seed(123)
df <- data.frame(id = c(1:7),
date = seq(as.Date("2020/12/26"),
as.Date("2021/1/1"), by = "day"))
#s second dataframe with repeated id
df1 <- data.frame(id = rep(1:5, each = 5),
date = sample(seq(as.Date('2020/12/20'),
as.Date('2021/1/15'), by="day"), 25),
assign = sample(c(0,1), replace=TRUE, size=25))
df1 <- arrange(df1,id, date)
# the output that I want
df$response <- c(1,0,0,1,0,NA,NA)
Upvotes: 1
Views: 43
Reputation: 886948
May be we can use a join
library(data.table)
df2 <- setDT(df1)[as.logical(assign)]
setDT(df)[df2, response := assign, on = .(id, date), roll = -Inf]
df[is.na(response) & id %in% df2$id, response := 0]
-output
df
# id date response
#1: 1 2020-12-26 1
#2: 2 2020-12-27 0
#3: 3 2020-12-28 0
#4: 4 2020-12-29 1
#5: 5 2020-12-30 0
#6: 6 2020-12-31 NA
#7: 7 2021-01-01 NA
Upvotes: 2