Reputation: 324
I have two dataframes which I want to merge, in one I have the sales of products
product quantity client date door
0 81007104 30 0 2014-01-05 40
1 81007104 41 1 2014-07-05 44
And another dataframe in what I have the contracts for the sales, so
product client door valid_from valid_until contract_num
0 81007104 71 6 2013-02-01 2014-02-01 25470
1 81007104 71 6 2014-05-01 2014-11-01 25944
So I want to join both dataframes using the columns of product, client, door and date between the range between valid from and valid until.
Without the date I think it will be
pd.merge(df1, df2, how='left', on=['product','client','door'])
but since I want the date to be between valid_from and valid_until the previous merge don't work.
With the previous merge I get
product quantity date client door valid from valid until
0 81007104 44.0 2014-03-05 71 6 2013-02-01 2014-02-01
1 81007104 44.0 2014-03-05 71 6 2014-05-01 2014-11-01
2 81007104 44.0 2014-03-05 71 6 2014-11-02 2015-07-20
But date is not between valid_from and valid_until, because what I want would be for the dates to be in the range of valid_from and valid_until
product quantity date client door valid from valid until
0 81007104 44.0 2014-01-05 71 6 2013-02-01 2014-02-01
1 81007104 44.0 2014-07-02 71 6 2014-05-01 2014-11-01
2 81007104 44.0 2015-06-01 71 6 2014-11-02 2015-07-20
Upvotes: 1
Views: 52
Reputation: 627
I suggest you convert dates to pd.datetime, like this:
#create data
df1 = pd.DataFrame(data = {'product': ['81007104','81007104'],
'quantity': ['30','41'],
'client': ['0','1'],
'date': ['20140105','20140705'],
'door': ['40','44']})
df2 = pd.DataFrame(data = {'product': ['81007104','81007104'],
'client': ['0','1'],
'valid from': ['20140105','20140705'],
'valid until': ['20140105','20140705'],
'door': ['6','6'],
'contact num': ['25470','25944']})
# convert to datetime
df1.date = pd.to_datetime(df1.date, format='%Y%m%d', errors='ignore')
df2['valid from'] = pd.to_datetime(df2['valid from'], format='%Y%m%d', errors='ignore')
df2['valid until'] = pd.to_datetime(df2['valid until'], format='%Y%m%d', errors='ignore')
then merge tables on the common features you want (maybe inner join is what you are looking for, and not left. I cannot understand from your description) and you can filter the dataframe according to dates afterwards, which is easy after converting to pd.datetime.
Upvotes: 2