Reputation: 53
I have two dataframes created like this:
date1 = pd.date_range('1/1/2011', periods=3, freq='8H')
date2 = pd.date_range('1/1/2011', periods=2, freq='12H')
df1 = pd.DataFrame({'cat1': ['1', '1', '2']}, index = date1)
df2 = pd.DataFrame({'cat1': ['1', '2'],
'cat2': ['111', '222']}, index = date2)
df1.index = df1.index.map(lambda x: x.to_pydatetime().date())
df2.index = df2.index.map(lambda x: x.to_pydatetime().date())
They look like this:
print(df1)
print(df2)
cat1
2011-01-01 1
2011-01-01 1
2011-01-01 2
cat1 cat2
2011-01-01 1 111
2011-01-01 2 222
I would like to merge them on datetime index and the cat1 column. I would like to get this:
cat1 cat2
2011-01-01 1 111
2011-01-01 1 111
2011-01-01 2 222
I attempt to get this by using:
df3 = df1.merge(df2, how='inner', right_index=True, left_index=True, on='cat1')
But, instead I'm getting this:
cat1 cat2
2011-01-01 1 111
2011-01-01 1 222
2011-01-01 1 111
2011-01-01 1 222
2011-01-01 2 111
2011-01-01 2 222
I'm sure that I'm overlooking something right in front of me, but I can't figure it out.
I actually over simplified the above scenario, so that the solution could not fix my actual problem. In my actual problem I have more dates.
The following more accurately represents my actual problem:
date1 = pd.DatetimeIndex(['2017-05-14', '2017-05-14', '2017-05-14', '2017-05-15'])
date2 = pd.DatetimeIndex(['2017-05-14', '2017-05-15', '2017-05-14', '2017-05-15'])
df1 = pd.DataFrame({'cat1': ['200', '200', '200', '100']}, index = date1)
df2 = pd.DataFrame({'cat1': ['200', '200', '100', '100'],
'cat2': ['111', '222', '333', '444']}, index = date2)
df3 = df1.merge(df2, how='inner', left_index=True, right_index=True, on='cat1')
print(df1)
print(df2)
print(df3)
cat1
2017-05-14 200
2017-05-14 200
2017-05-14 200
2017-05-15 100
cat1 cat2
2017-05-14 200 111
2017-05-15 200 222
2017-05-14 100 333
2017-05-15 100 444
cat1 cat2
2017-05-14 200 111
2017-05-14 200 222
2017-05-14 200 111
2017-05-14 200 222
2017-05-14 200 111
2017-05-14 200 222
2017-05-15 100 333
2017-05-15 100 444
But, for df3 I expect to get this:
cat1 cat2
2017-05-14 200 111
2017-05-14 200 111
2017-05-14 200 111
2017-05-15 100 444
Thank you!
Upvotes: 2
Views: 4131
Reputation: 477
Almost there!
df3 = df1.merge(df2, how='inner', right_index=True, left_index=False, on='cat1')
Note the left_index
kwarg :)
More in general, you may need to reset/set the index, so you can join based on indexes or on columns, without mixing the two. This should be working for the general case (now you are simply performing a join on two different columns, dropping both indexes):
df3 = pd.merge(df1.reset_index(), df2.reset_index(), on=['index','cat1']).set_index('index')
This is the same, ma more similar to previous solution
df3 = df1.reset_index().merge(df2.reset_index(), on=['index','cat1']).set_index('index')
Upvotes: 1