set92
set92

Reputation: 324

Merge 2 dataframes by 3 columns and range of a date

I have two dataframes which I want to merge, in one I have the sales of products

    product   quantity   client   date       door
0   81007104  30         0        2014-01-05   40
1   81007104  41         1        2014-07-05   44

And another dataframe in what I have the contracts for the sales, so

   product    client    door    valid_from  valid_until  contract_num 
0  81007104   71        6       2013-02-01  2014-02-01   25470
1  81007104   71        6       2014-05-01  2014-11-01   25944

So I want to join both dataframes using the columns of product, client, door and date between the range between valid from and valid until.

Without the date I think it will be

pd.merge(df1, df2, how='left', on=['product','client','door'])

but since I want the date to be between valid_from and valid_until the previous merge don't work.

With the previous merge I get

    product     quantity    date        client  door    valid from  valid until 
0   81007104    44.0        2014-03-05  71      6       2013-02-01  2014-02-01
1   81007104    44.0        2014-03-05  71      6       2014-05-01  2014-11-01  
2   81007104    44.0        2014-03-05  71      6       2014-11-02  2015-07-20

But date is not between valid_from and valid_until, because what I want would be for the dates to be in the range of valid_from and valid_until

    product     quantity    date        client  door    valid from  valid until 
0   81007104    44.0        2014-01-05  71      6       2013-02-01  2014-02-01
1   81007104    44.0        2014-07-02  71      6       2014-05-01  2014-11-01  
2   81007104    44.0        2015-06-01  71      6       2014-11-02  2015-07-20

Upvotes: 1

Views: 52

Answers (1)

Neo
Neo

Reputation: 627

I suggest you convert dates to pd.datetime, like this:

#create data
df1 = pd.DataFrame(data = {'product': ['81007104','81007104'], 
                          'quantity': ['30','41'],
                          'client': ['0','1'],
                          'date': ['20140105','20140705'],
                          'door': ['40','44']})  
df2 = pd.DataFrame(data = {'product': ['81007104','81007104'], 
                          'client': ['0','1'],
                          'valid from': ['20140105','20140705'],
                          'valid until': ['20140105','20140705'],
                          'door': ['6','6'],
                          'contact num': ['25470','25944']})
# convert to datetime     
df1.date = pd.to_datetime(df1.date, format='%Y%m%d', errors='ignore')
df2['valid from'] = pd.to_datetime(df2['valid from'], format='%Y%m%d', errors='ignore')
df2['valid until'] = pd.to_datetime(df2['valid until'], format='%Y%m%d', errors='ignore')

then merge tables on the common features you want (maybe inner join is what you are looking for, and not left. I cannot understand from your description) and you can filter the dataframe according to dates afterwards, which is easy after converting to pd.datetime.

Upvotes: 2

Related Questions