Filter dataframe based on a date that may or may not be contained in the dataframe

Question

I have a dataframe (df) like the following:

    derv        market       date
 -10.7803563 S&P 500 Index 2008-01-02
 -15.6922552 S&P 500 Index 2008-01-03
 -15.7648483 S&P 500 Index 2008-01-04
 -10.2294744 S&P 500 Index 2008-01-07
  -0.5918593 S&P 500 Index 2008-01-08
   8.1518987 S&P 500 Index 2008-01-09
 .....
  84.1518987 S&P 500 Index 2014-12-31

and I want to find the 10 trading days in df before a specific day. For example, 2008-01-12.

I have thought of using dplyr like the following:

df %>% select(derv,Market,date) %>%
            filter(date > 2008-01-12 - 10 & Date <2008-01-12)

but the issue I am having is about how to index the 10 trading days before the specific day. The code I have above is not working and I do not know how to deal with it in the case of using dplyr.

Another concerning issue is that the specific day (e.g. 2008-01-12) may or may not be in df. If the specific is in df, I think I only need to go back 9 days to count; but it is not in df, I need to go back 10 indices. I am not sure if I am correct here or not, but this is the part making me confused.

Would greatly appreciate any insight.

Haboryme · Accepted Answer

Using dplyr and data.table::rleid()
Example data:

set.seed(123)
df=data.frame(derv=rnorm(18),Date=as.Date(c(1,2,3,4,6,7,9,11,12,13,14,15,18,19,20,21,23,24),origin="2008-01-01"))

An column with an index is created in order to select no more than 10 days before the chosen date.

library(dplyr)
library(data.table)
df %>%
  filter(Date < "2008-01-19") %>%
  mutate(id = rleid(Date)) %>%
  filter(id > (max(id)-10)) %>%
  ungroup() %>%
  select(derv,Date)

         derv       Date
1  -1.0678237 2008-01-04
2  -0.2179749 2008-01-05
3  -1.0260044 2008-01-07
4  -0.7288912 2008-01-08
5  -0.6250393 2008-01-10
6  -1.6866933 2008-01-12
7   0.8377870 2008-01-13
8   0.1533731 2008-01-14
9  -1.1381369 2008-01-15
10  1.2538149 2008-01-16

EDIT: Procrastinatus Maximus' solution is shorter and only requires dplyr

df %>% filter(Date < "2008-01-19") %>% filter(row_number() > (max(row_number())-10))

This gives the same output.

Filter dataframe based on a date that may or may not be contained in the dataframe

Answers (2)

Related Questions