embee
embee

Reputation: 17

Create list of dataframes subset by date range

I am trying to create a series of dataframes which are subset from a larger dataframe by a date range (2-year blocks), in order to do a separate survival analysis for each new dataframe. I cannot use "split" to split the dataframe based on one factor, as the data will need to be present in more than one subset.

I have some example data as follows:

Patient <- c(1:10)
First.Appt <- c("2014-01-01","2014-03-02","2015-05-17","2015-06-03","2016-01-12","2016-11-07","2017-07-08","2017-09-09","2018-04-12","2018-05-13")
DOD <- c("2014-01-29","2014-03-30","2015-06-14","2015-07-01","2016-02-09","2016-12-05","2017-08-05","2017-10-07","2018-05-10","2018-06-10")
First.Appt.Year <- c(2014,2014,2015,2015,2016,2016,2017,2017,2018,2018)

df <- as.data.frame(cbind(Patient, First.Appt, DOD, First.Appt.Year))%>%
  mutate_at("First.Appt.Year", as.numeric)

I have created a start date (the minimum First.Appt.Year), the final start date (maximum First.Appt.Year - 1), and then a vector containing all my start dates from which to subset full 2-year blocks as follows:

Start.year <- as.numeric(min(df$First.Appt.Year))

Final.start.year <- max(df$First.Appt.Year) - 1

Start.vec <- c(Start.year:Final.start.year)

I thought to use a for loop with lapply to create a subset based on First.Appt.Year falling within the range of Start.vec and Start.vec + 1, for each value of Start.vec as follows:

for (i in 1:length(Start.vec)){
new.df = lapply(Start.vec, function(x) 
subset(df, First.Appt.Year == Start.vec[i] | First.Appt.Year == Start.vec[i] + 1))
}

This almost works, but instead of creating four different dataframes (e.g. 2014-2015, 2015-2016, 2016-2017 and 2017-2018), all four of the dataframes in the output list only contain 2017-2018 values as below.

Patient First.Appt DOD First.Appt.Year
7 08/07/2017 05/08/2017 2017
8 09/09/2017 07/10/2017 2017
9 12/04/2018 10/05/2018 2018
10 13/05/2018 10/06/2018 2018

Can anyone help me with what I am doing wrong and how to return the different subsets into each list object?

If there are other ways of doing this that seem more logical then please let me know.

Upvotes: 0

Views: 297

Answers (2)

Comevussor
Comevussor

Reputation: 318

It looks like a simple misunderstanding about the use of lapply. You don't need to wrap it in a for loop. Just replace your last block with :

new.df = lapply(Start.vec, function(x) subset(df, First.Appt.Year == x | First.Appt.Year == x + 1))

And that should work. At least, it does on my side.

Upvotes: 0

Bas
Bas

Reputation: 4658

You are close! Instead of using both the for loop and the lapply, you need only one.

For example, with the lapply:

new.df <- lapply(Start.vec, function(x) subset(df, First.Appt.Year == x | First.Appt.Year == x + 1))

And using only the for loop:

df_list <- list()

for (i in 1:length(Start.vec)){
  new.df <- subset(df, First.Appt.Year == Start.vec[i] | First.Appt.Year == Start.vec[i] + 1)
  
  df_list <- c(df_list, list(new.df))
}

df_list

Upvotes: 0

Related Questions