haimen
haimen

Reputation: 2015

Take dates from one dataframe and filter data in another dataframe

I have two dataframes,

user=c(rep('A',7),rep('B',8))
data = seq(1:15)
date = as.Date(c('2016-01-01','2016-01-02','2016-01-03','2016-01-04','2016-01-05','2016-01-06','2016-01-07','2016-01-08','2016-01-09','2016-01-10','2016-01-11','2016-01-12','2016-01-13','2016-01-14','2016-01-15'))
df = data.frame(user,date,data)

df

        user   date      data
    1     A 2016-01-01    1
    2     A 2016-01-02    2
    3     A 2016-01-03    3
    4     A 2016-01-04    4
    5     A 2016-01-05    5
    6     A 2016-01-06    6
    7     A 2016-01-07    7
    8     B 2016-01-08    8
    9     B 2016-01-09    9
    10    B 2016-01-10   10
    11    B 2016-01-11   11
    12    B 2016-01-12   12
    13    B 2016-01-13   13
    14    B 2016-01-14   14
    15    B 2016-01-15   15

and

df1 =data.frame(user = c('A','B'), start_date = as.Date(c('2016-01-02','2016-01-10')),  end_date = as.Date(c('2016-01-06','2016-01-14')))
> df1
  user start_date   end_date
1    A 2016-01-02 2016-01-06
2    B 2016-01-10 2016-01-14

I want to take the start date and end date from df1 , and filter the records in the date column of df dataframe. The data for a particular user should be only between the start_date and end_date of df1. The resulting dataframe should have the following output,

user   date      data 
  A  2016-01-02    2
  A  2016-01-03    3
  A  2016-01-04    4
  A  2016-01-05    5
  A  2016-01-06    6
  B  2016-01-10   10
  B  2016-01-11   11
  B  2016-01-12   12
  B  2016-01-13   13
  B  2016-01-14   14

I have tried the following,

Looping through each user, passing it to a dataframe. Then filtering it again with the start_date and end_date of corresponding entry in df1, and then appending it to a new dataframe. This is taking a very long time for me since the data is very huge. Is there a more efficient way to do this?

Thanks

Upvotes: 5

Views: 1321

Answers (2)

Arun
Arun

Reputation: 118889

With the recently implemented non-equi joins feature in the data.table v1.9.8+, this can be done as follows:

require(data.table) # v1.9.8+
setDT(df)[df1, .(user,date,data), on=.(user, date>=start_date, date<=end_date)]

Upvotes: 3

adaien
adaien

Reputation: 1942

library(dplyr)
df<-left_join(df,df1,by="user")
df <- df %>% filter(date>=start_date & date<=end_date)

Upvotes: 4

Related Questions