Reputation: 207
pandas.merge acts differently for the left and right sides!!! For the left side if we use left_on and left_index together it shows an error, but the same for the right side works!!!
Code:
import pandas as pd
import numpy as np
right = pd.DataFrame(data=np.arange(12).reshape((6,2)),index=[['Nevada', 'Nevada', 'Ohio', 'Ohio', 'Ohio', 'Ohio'],[2001, 2000, 2000, 2000, 2001, 2002]],columns=['event1','event2'])
left = pd.DataFrame(data={'key1':['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],'key2':[2000, 2001, 2002, 2001, 2002],'data':np.arange(5.)})
pd.merge(left,right,right_index=True,left_index=True,right_on='event1')#it works and returns an empty table which is expected
pd.merge(left,right,left_index=True,right_index=True,left_on='key1')# it makes error !!!
Upvotes: 0
Views: 3900
Reputation: 17560
You have a few issues going on. First your merge statements are not constructed correctly. You shouldn't be using both a left_on
and left_index
or right_on
and right_index
at the same time. You should use only one left option and one right option.
The reason you get an error in your second statement is because the index levels do not match. In your left merge, the left index is a single level, and you while you specify both right_index=True
and right_on='event1'
, the right_on
attribute is taking precedence. Since both are single level integers, there is no problem. I should point out that the merge, if constructed correctly, (pd.merge(left, right, left_index=True, right_on='event1', how='left')
) does not produce an empty DataFrame... See code below.
In your right merge, you specify using the right index with right_index=True
and left_on
takes precedence over left_index=True
. The issue here is that the right index is 2 levels, where as your 'key1` field is a single level string.
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: right = pd.DataFrame(data=np.arange(12).reshape((6,2)),index=[['Nevada', 'Nevada', 'Ohio', 'Ohio', 'Ohio', 'Ohio'],[2001, 2000, 2000, 2000, 2001, 2002]],columns=['event1','event2'])
In [4]: left = pd.DataFrame(data={'key1':['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],'key2':[2000, 2001, 2002, 2001, 2002],'data':np.arange(5.)})
In [5]: left
Out[5]:
data key1 key2
0 0 Ohio 2000
1 1 Ohio 2001
2 2 Ohio 2002
3 3 Nevada 2001
4 4 Nevada 2002
In [6]: right
Out[6]:
event1 event2
Nevada 2001 0 1
2000 2 3
Ohio 2000 4 5
2000 6 7
2001 8 9
2002 10 11
In [5]: left_merge = left.merge(right, left_index=True, right_on='event1', how='left')
In [7]: left_merge
Out[7]:
data key1 key2 event1 event2
Nevada 2001 0 Ohio 2000 0 1
Ohio 2002 1 Ohio 2001 1 NaN
Nevada 2000 2 Ohio 2002 2 3
Ohio 2002 3 Nevada 2001 3 NaN
2000 4 Nevada 2002 4 5
Upvotes: 2