subset dataframe based on conditions in vector

Question

I have two dataframes

#df1
type <- c("A", "B", "C")
day_start <- c(5,8,4)
day_end <- c(12,10,11)
df1 <- cbind.data.frame(type, day_start, day_end)
df1
  type day_start day_end
1    A         5      12
2    B         8      10
3    C         4      11

#df2
value <- 1:10
day <- 4:13
df2 <- cbind.data.frame(day, value)
   day value
1    4     1
2    5     2
3    6     3
4    7     4
5    8     5
6    9     6
7   10     7
8   11     8
9   12     9
10  13    10

I would like to subset df2 such that each level of factor "type" in df1 gets its own dataframe, only including the rows/days between day_start and day_end of this factor level.

Desired outcome for "A" would be..

list_of_dataframes$df_A
   day value
1    5     2
2    6     3
3    7     4
4    8     5
5    9     6
6   10     7
7   11     8
8   12     9

I found this question on SO with the answer suggesting to use mapply(), however, I just cannot figure out how I have to adapt the code given there to fit my data and desired outcome.. Can someone help me out?

Thomas · Accepted Answer

The following solution assumes that you have all integer values for days, but if that assumption is plausible, it's an easy one-liner:

> apply(df1, 1, function(x) df2[df2$day %in% x[2]:x[3],])
[[1]]
  day value
2   5     2
3   6     3
4   7     4
5   8     5
6   9     6
7  10     7
8  11     8
9  12     9

[[2]]
  day value
5   8     5
6   9     6
7  10     7

[[3]]
  day value
1   4     1
2   5     2
3   6     3
4   7     4
5   8     5
6   9     6
7  10     7
8  11     8

You can use setNames to name the dataframes in the list:

setNames(apply(df1, 1, function(x) df2[df2$day %in% x[2]:x[3],]),df1[,1])

subset dataframe based on conditions in vector

Answers (2)

Related Questions