Reputation: 17
I am trying to create a series of dataframes which are subset from a larger dataframe by a date range (2-year blocks), in order to do a separate survival analysis for each new dataframe. I cannot use "split" to split the dataframe based on one factor, as the data will need to be present in more than one subset.
I have some example data as follows:
Patient <- c(1:10)
First.Appt <- c("2014-01-01","2014-03-02","2015-05-17","2015-06-03","2016-01-12","2016-11-07","2017-07-08","2017-09-09","2018-04-12","2018-05-13")
DOD <- c("2014-01-29","2014-03-30","2015-06-14","2015-07-01","2016-02-09","2016-12-05","2017-08-05","2017-10-07","2018-05-10","2018-06-10")
First.Appt.Year <- c(2014,2014,2015,2015,2016,2016,2017,2017,2018,2018)
df <- as.data.frame(cbind(Patient, First.Appt, DOD, First.Appt.Year))%>%
mutate_at("First.Appt.Year", as.numeric)
I have created a start date (the minimum First.Appt.Year), the final start date (maximum First.Appt.Year - 1), and then a vector containing all my start dates from which to subset full 2-year blocks as follows:
Start.year <- as.numeric(min(df$First.Appt.Year))
Final.start.year <- max(df$First.Appt.Year) - 1
Start.vec <- c(Start.year:Final.start.year)
I thought to use a for loop with lapply to create a subset based on First.Appt.Year falling within the range of Start.vec and Start.vec + 1, for each value of Start.vec as follows:
for (i in 1:length(Start.vec)){
new.df = lapply(Start.vec, function(x)
subset(df, First.Appt.Year == Start.vec[i] | First.Appt.Year == Start.vec[i] + 1))
}
This almost works, but instead of creating four different dataframes (e.g. 2014-2015, 2015-2016, 2016-2017 and 2017-2018), all four of the dataframes in the output list only contain 2017-2018 values as below.
Patient | First.Appt | DOD | First.Appt.Year |
---|---|---|---|
7 | 08/07/2017 | 05/08/2017 | 2017 |
8 | 09/09/2017 | 07/10/2017 | 2017 |
9 | 12/04/2018 | 10/05/2018 | 2018 |
10 | 13/05/2018 | 10/06/2018 | 2018 |
Can anyone help me with what I am doing wrong and how to return the different subsets into each list object?
If there are other ways of doing this that seem more logical then please let me know.
Upvotes: 0
Views: 297
Reputation: 318
It looks like a simple misunderstanding about the use of lapply. You don't need to wrap it in a for loop. Just replace your last block with :
new.df = lapply(Start.vec, function(x) subset(df, First.Appt.Year == x | First.Appt.Year == x + 1))
And that should work. At least, it does on my side.
Upvotes: 0
Reputation: 4658
You are close! Instead of using both the for
loop and the lapply
, you need only one.
For example, with the lapply
:
new.df <- lapply(Start.vec, function(x) subset(df, First.Appt.Year == x | First.Appt.Year == x + 1))
And using only the for
loop:
df_list <- list()
for (i in 1:length(Start.vec)){
new.df <- subset(df, First.Appt.Year == Start.vec[i] | First.Appt.Year == Start.vec[i] + 1)
df_list <- c(df_list, list(new.df))
}
df_list
Upvotes: 0