R - populate a column for grouped entries based on conditions in subsequent rows of a data.frame

Question

I have a large clinical dataset that I am planning to populate with additional columns and the criterion will almost be similar and hence it probably comes down to one similar problem.

I have so far figured out that, first I need to group my entries based on patient_id but I have been unable to proceed from here.

Below is a snapshot of the data. When copied and ran in R, it creates a data.frame called myDF

myDF <- structure(list(patient_id = c(1L, 1L, 1L, 1L, 1L), date = structure(c(17167, 
17168, 17169, 17170, 17171), class = "Date"), date_recruited = c("yes", 
"", "", "", ""), ill = c("no", "no", "yes", "yes", "no")), class = "data.frame", .Names = c("id", 
"date", "date_recruited", "ill"), row.names = c(NA, -5L))

I would want to create a new column (let's call it "drop"), such that, for every id, if the difference between date when ill == "yes" and date_recruited = 3, populate with drop.

something like this:

myDF2 <- structure(list(paitent_id = c(1L, 1L, 1L, 1L, 1L), date = structure(c(17167, 
17168, 17169, 17170, 17171), class = "Date"), date_recruited = c("yes", 
"", "", "", ""), ill = c("no", "no", "yes", "yes", "no"), drop = c("", 
"", "", "drop", "")), class = "data.frame", .Names = c("paitent_id", 
"date", "date_recruited", "ill", "drop"), row.names = c(NA, -5L
))

Any assistance is welcome...

Andrew Gustar · Accepted Answer

In dplyr you could do the following.

myDF2 <- myDF %>% group_by(id) %>% mutate(recdate=date[which(date_recruited=="yes")[1]],
                                      drop=ifelse(ill=="yes" & date==recdate+3,"drop",""))

R - populate a column for grouped entries based on conditions in subsequent rows of a data.frame

Answers (1)

Related Questions