Create list of dataframes subset by date range

Question

I am trying to create a series of dataframes which are subset from a larger dataframe by a date range (2-year blocks), in order to do a separate survival analysis for each new dataframe. I cannot use "split" to split the dataframe based on one factor, as the data will need to be present in more than one subset.

I have some example data as follows:

Patient <- c(1:10)
First.Appt <- c("2014-01-01","2014-03-02","2015-05-17","2015-06-03","2016-01-12","2016-11-07","2017-07-08","2017-09-09","2018-04-12","2018-05-13")
DOD <- c("2014-01-29","2014-03-30","2015-06-14","2015-07-01","2016-02-09","2016-12-05","2017-08-05","2017-10-07","2018-05-10","2018-06-10")
First.Appt.Year <- c(2014,2014,2015,2015,2016,2016,2017,2017,2018,2018)

df <- as.data.frame(cbind(Patient, First.Appt, DOD, First.Appt.Year))%>%
  mutate_at("First.Appt.Year", as.numeric)

I have created a start date (the minimum First.Appt.Year), the final start date (maximum First.Appt.Year - 1), and then a vector containing all my start dates from which to subset full 2-year blocks as follows:

Start.year <- as.numeric(min(df$First.Appt.Year))

Final.start.year <- max(df$First.Appt.Year) - 1

Start.vec <- c(Start.year:Final.start.year)

I thought to use a for loop with lapply to create a subset based on First.Appt.Year falling within the range of Start.vec and Start.vec + 1, for each value of Start.vec as follows:

for (i in 1:length(Start.vec)){
new.df = lapply(Start.vec, function(x) 
subset(df, First.Appt.Year == Start.vec[i] | First.Appt.Year == Start.vec[i] + 1))
}

This almost works, but instead of creating four different dataframes (e.g. 2014-2015, 2015-2016, 2016-2017 and 2017-2018), all four of the dataframes in the output list only contain 2017-2018 values as below.

Patient	First.Appt	DOD	First.Appt.Year
7	08/07/2017	05/08/2017	2017
8	09/09/2017	07/10/2017	2017
9	12/04/2018	10/05/2018	2018
10	13/05/2018	10/06/2018	2018

Can anyone help me with what I am doing wrong and how to return the different subsets into each list object?

If there are other ways of doing this that seem more logical then please let me know.

Comevussor · Accepted Answer

It looks like a simple misunderstanding about the use of lapply. You don't need to wrap it in a for loop. Just replace your last block with :

new.df = lapply(Start.vec, function(x) subset(df, First.Appt.Year == x | First.Appt.Year == x + 1))

And that should work. At least, it does on my side.

Create list of dataframes subset by date range

Answers (2)

Related Questions