RylonMcnz
RylonMcnz

Reputation: 53

Merge two dataframes on datetime index and column

I have two dataframes created like this:

date1 = pd.date_range('1/1/2011', periods=3, freq='8H')
date2 = pd.date_range('1/1/2011', periods=2, freq='12H')
df1 = pd.DataFrame({'cat1': ['1', '1', '2']}, index = date1)
df2 = pd.DataFrame({'cat1': ['1', '2'],
                    'cat2': ['111', '222']}, index = date2)
df1.index = df1.index.map(lambda x: x.to_pydatetime().date())
df2.index = df2.index.map(lambda x: x.to_pydatetime().date())

They look like this:

print(df1)
print(df2)

           cat1
2011-01-01    1
2011-01-01    1
2011-01-01    2
           cat1 cat2
2011-01-01    1  111
2011-01-01    2  222

I would like to merge them on datetime index and the cat1 column. I would like to get this:

           cat1 cat2
2011-01-01    1  111
2011-01-01    1  111
2011-01-01    2  222

I attempt to get this by using:

df3 = df1.merge(df2, how='inner', right_index=True, left_index=True, on='cat1')

But, instead I'm getting this:

           cat1 cat2
2011-01-01    1  111
2011-01-01    1  222
2011-01-01    1  111
2011-01-01    1  222
2011-01-01    2  111
2011-01-01    2  222

I'm sure that I'm overlooking something right in front of me, but I can't figure it out.

I actually over simplified the above scenario, so that the solution could not fix my actual problem. In my actual problem I have more dates.

The following more accurately represents my actual problem:

date1 = pd.DatetimeIndex(['2017-05-14', '2017-05-14', '2017-05-14', '2017-05-15'])
date2 = pd.DatetimeIndex(['2017-05-14', '2017-05-15', '2017-05-14', '2017-05-15'])
df1 = pd.DataFrame({'cat1': ['200', '200', '200', '100']}, index = date1)
df2 = pd.DataFrame({'cat1': ['200', '200', '100', '100'],
                    'cat2': ['111', '222', '333', '444']}, index = date2)

df3 = df1.merge(df2, how='inner', left_index=True, right_index=True, on='cat1')

print(df1)
print(df2)
print(df3)

           cat1
2017-05-14  200
2017-05-14  200
2017-05-14  200
2017-05-15  100
           cat1 cat2
2017-05-14  200  111
2017-05-15  200  222
2017-05-14  100  333
2017-05-15  100  444
           cat1 cat2
2017-05-14  200  111
2017-05-14  200  222
2017-05-14  200  111
2017-05-14  200  222
2017-05-14  200  111
2017-05-14  200  222
2017-05-15  100  333
2017-05-15  100  444

But, for df3 I expect to get this:

           cat1 cat2
2017-05-14  200  111
2017-05-14  200  111
2017-05-14  200  111
2017-05-15  100  444

Thank you!

Upvotes: 2

Views: 4131

Answers (1)

pazqo
pazqo

Reputation: 477

Almost there!

df3 = df1.merge(df2, how='inner', right_index=True, left_index=False, on='cat1')

Note the left_index kwarg :)

More in general, you may need to reset/set the index, so you can join based on indexes or on columns, without mixing the two. This should be working for the general case (now you are simply performing a join on two different columns, dropping both indexes):

df3 = pd.merge(df1.reset_index(), df2.reset_index(), on=['index','cat1']).set_index('index')

This is the same, ma more similar to previous solution

df3 = df1.reset_index().merge(df2.reset_index(), on=['index','cat1']).set_index('index') 

Upvotes: 1

Related Questions