Learner
Learner

Reputation: 83

I want to created another column in the first dataframe based on values of second column prior to or on the date of first dataframe for each group

I have two dataframes one with unique id and the other with multiple ids. I want to create another column in the first dataframe if the value in the second dataframe on or prior to first dataframe date has a value of 1, and if the id is missing in the second dataframe i want to assign NA. What would be the most efficient way to do it in R?

# first dataframe with each unique id 
set.seed(123)
df <- data.frame(id = c(1:7),
                 date = seq(as.Date("2020/12/26"),
                            as.Date("2021/1/1"), by = "day"))

#s second dataframe with repeated id 
df1 <- data.frame(id = rep(1:5, each = 5),
                  date = sample(seq(as.Date('2020/12/20'), 
                                    as.Date('2021/1/15'), by="day"), 25),
                  assign = sample(c(0,1), replace=TRUE, size=25))

df1 <- arrange(df1,id, date)

# the output that I want 
df$response <- c(1,0,0,1,0,NA,NA)

Upvotes: 1

Views: 43

Answers (1)

akrun
akrun

Reputation: 886948

May be we can use a join

library(data.table)
df2 <- setDT(df1)[as.logical(assign)]
setDT(df)[df2, response := assign, on = .(id, date), roll = -Inf]
df[is.na(response) & id %in% df2$id, response := 0]

-output

df
#   id       date response
#1:  1 2020-12-26        1
#2:  2 2020-12-27        0
#3:  3 2020-12-28        0
#4:  4 2020-12-29        1
#5:  5 2020-12-30        0
#6:  6 2020-12-31       NA
#7:  7 2021-01-01       NA

Upvotes: 2

Related Questions