JejeBelfort
JejeBelfort

Reputation: 1663

Joining two Dataframes based on index and column

I have the following two Dataframes:

df1:

            Id
date          
2014-03-13   1
2014-03-14   2
2014-03-15   1

df2:

            Id people  value
date                        
2014-03-13   1      A   -3.0
2014-03-13   1      B   -6.0
2014-03-13   4      C   -3.2
2014-03-14   1      A   -3.1
2014-03-14   2      B   -5.0
2014-03-14   2      C   -3.4
2014-03-14   7      D   -6.2
2014-03-14   8      E   -3.2
2014-03-15   1      A   -3.2
2014-03-15   3      B   -5.9

What I would like to do is to merge these two Dataframes based on the Id, consistently with the index (date).

The desired outcome would be the following:

            Id people  value
date                        
2014-03-13   1      A   -3.0
2014-03-13   1      B   -6.0
2014-03-14   2      B   -5.0
2014-03-14   2      C   -3.4
2014-03-15   1      A   -3.2

I have tried hard to use merge and join, but without success.

The code generating the input is the following:

import pandas as pd

dates = [pd.to_datetime('2014-03-13', format='%Y-%m-%d'), pd.to_datetime('2014-03-14', format='%Y-%m-%d'), pd.to_datetime('2014-03-15', format='%Y-%m-%d')]
Ids = [1,2,1]
df1 = pd.DataFrame({'Id': pd.Series(Ids, index=dates)})
df1.index.name = 'date'

dates = [pd.to_datetime('2014-03-13', format='%Y-%m-%d'), pd.to_datetime('2014-03-13', format='%Y-%m-%d'),
         pd.to_datetime('2014-03-13', format='%Y-%m-%d'), pd.to_datetime('2014-03-14', format='%Y-%m-%d'),
         pd.to_datetime('2014-03-14', format='%Y-%m-%d'),pd.to_datetime('2014-03-14', format='%Y-%m-%d'),
         pd.to_datetime('2014-03-14', format='%Y-%m-%d'), pd.to_datetime('2014-03-14', format='%Y-%m-%d'), 
         pd.to_datetime('2014-03-15', format='%Y-%m-%d'), pd.to_datetime('2014-03-15', format='%Y-%m-%d')]
Ids = [1,1,4,1,2,2,7,8,1,3]
peoples = ['A','B','C','A','B','C','D','E','A','B']
values = [-3,-6,-3.2,-3.1,-5,-3.4,-6.2,-3.2,-3.2,-5.9]
df2 = pd.DataFrame({'Id': pd.Series(Ids, index=dates),
                    'people': pd.Series(peoples, index=dates),
                    'value': pd.Series(values, index=dates)})
df2.index.name = 'date'

Upvotes: 0

Views: 264

Answers (1)

jezrael
jezrael

Reputation: 862406

The simpliest is merge + reset_index:

df = pd.merge(df1.reset_index(), df2.reset_index(), on=['date','Id']).set_index('date')
print (df)
date        Id people  value
2014-03-13   1      A   -3.0
2014-03-13   1      A   -6.0
2014-03-14   2      B   -5.0
2014-03-14   2      A   -3.4
2014-03-15   1      F   -3.2

And also:

df = pd.merge(df1.set_index('Id', append=True), 
             df2.set_index('Id', append=True), 
             left_index=True, 
             right_index=True)
print (df)
              people  value
date       Id              
2014-03-13 1       A   -3.0
           1       A   -6.0
2014-03-14 2       B   -5.0
           2       A   -3.4
2014-03-15 1       F   -3.2

Upvotes: 1

Related Questions