lll
lll

Reputation: 1109

Filter dataframe based on a date that may or may not be contained in the dataframe

I have a dataframe (df) like the following:

    derv        market       date
 -10.7803563 S&P 500 Index 2008-01-02
 -15.6922552 S&P 500 Index 2008-01-03
 -15.7648483 S&P 500 Index 2008-01-04
 -10.2294744 S&P 500 Index 2008-01-07
  -0.5918593 S&P 500 Index 2008-01-08
   8.1518987 S&P 500 Index 2008-01-09
 .....
  84.1518987 S&P 500 Index 2014-12-31

and I want to find the 10 trading days in df before a specific day. For example, 2008-01-12.

I have thought of using dplyr like the following:

df %>% select(derv,Market,date) %>%
            filter(date > 2008-01-12 - 10 & Date <2008-01-12)

but the issue I am having is about how to index the 10 trading days before the specific day. The code I have above is not working and I do not know how to deal with it in the case of using dplyr.

Another concerning issue is that the specific day (e.g. 2008-01-12) may or may not be in df. If the specific is in df, I think I only need to go back 9 days to count; but it is not in df, I need to go back 10 indices. I am not sure if I am correct here or not, but this is the part making me confused.

Would greatly appreciate any insight.

Upvotes: 0

Views: 155

Answers (2)

Haboryme
Haboryme

Reputation: 4761

Using dplyr and data.table::rleid()
Example data:

set.seed(123)
df=data.frame(derv=rnorm(18),Date=as.Date(c(1,2,3,4,6,7,9,11,12,13,14,15,18,19,20,21,23,24),origin="2008-01-01"))

An column with an index is created in order to select no more than 10 days before the chosen date.

library(dplyr)
library(data.table)
df %>%
  filter(Date < "2008-01-19") %>%
  mutate(id = rleid(Date)) %>%
  filter(id > (max(id)-10)) %>%
  ungroup() %>%
  select(derv,Date)

         derv       Date
1  -1.0678237 2008-01-04
2  -0.2179749 2008-01-05
3  -1.0260044 2008-01-07
4  -0.7288912 2008-01-08
5  -0.6250393 2008-01-10
6  -1.6866933 2008-01-12
7   0.8377870 2008-01-13
8   0.1533731 2008-01-14
9  -1.1381369 2008-01-15
10  1.2538149 2008-01-16

EDIT: Procrastinatus Maximus' solution is shorter and only requires dplyr

df %>% filter(Date < "2008-01-19") %>% filter(row_number() > (max(row_number())-10))

This gives the same output.

Upvotes: 3

user14353
user14353

Reputation: 135

So the answer to this question really depends on how your dates are stored in R. But let's assume ISO 8601, which is what it looks like based on your code.

So first let's make some data.

    mydates <- as.Date("2007-06-22")
    mydates<-c(mydates[1]+1:11, mydates[1]+14:19)

    StockPrice<-c(1:17)

    df<-data.frame(mydates,StockPrice)

Then specify the date of interest like @stats_guy

    dateofinterest<-as.Date("2007-07-11")

I'd say use subset, and just subtract 11 from your date since it's already in that format.

    foo<-subset(df, mydates<dateofinterest & mydates>(dateofinterest-11))

Then you'll have a nice span of 10 days, but I'm not sure if you want 10 trading days? Or just 10 consecutive days, even if that means your list of prices might be < 10. I intentionally made my dataset with breaks like real market data to illustrate that point. So I came up with 8 values over the 10 day period instead of 10. Interested to hear what you're actually looking for.

Say you were actually looking for 10 trading days. Just to be the devil's advocate here you could assume that there won't be more than 10 ten days of no trading. So we go 20 days back in time before your date of interest.

    foo<-subset(df, mydates<dateofinterest & mydates>(dateofinterest-20))

Then we check your subset of data to see if there are more than 10 trading days within it using an if statement. If there are more then 10 rows then you have too many days. We just trim it the subset data, foo, to the right length starting from the bottom (the latest date) and then count up 9 entries from there. Now you have ten trading days in a nice tidy dataset.

    if (nrow(foo)>10){
    foo<-foo[(nrow(foo)-9):(nrow(foo)),]
    }

Upvotes: 1

Related Questions