Reputation: 4307
Can somebody please explain this result to me? In particular, I don't know where the NaN
s come from in the result. Also, I don't know how the join
will decide what row to match with what row in this case.
left_df = pd.DataFrame.from_dict({'unique_l':[0, 1, 2, 3, 4], 'join':['a', 'a', 'b','b', 'c'] })
right_df = pd.DataFrame.from_dict({'unique_r':[10, 11, 12, 13, 14], 'join':['a', 'b', 'b','c', 'c'] })
join unique_l
0 a 0
1 a 1
2 b 2
3 b 3
4 c 4
join unique_r
0 a 10
1 b 11
2 b 12
3 c 13
4 c 14
print left_df.join(right_df, on='join', rsuffix='_r')
join unique_l join_r unique_r
0 a 0 NaN NaN
1 a 1 NaN NaN
2 b 2 NaN NaN
3 b 3 NaN NaN
4 c 4 NaN NaN
Upvotes: 3
Views: 121
Reputation:
The join
method makes use of indices. What you want is merge
:
In [6]: left_df.merge(right_df, on="join", suffixes=("_l", "_r"))
Out[6]:
join unique_l unique_r
0 a 0 10
1 a 1 10
2 b 2 11
3 b 2 12
4 b 3 11
5 b 3 12
6 c 4 13
7 c 4 14
Here is a related (but, IMO, not quite a duplicate) question that explains the difference between join
and merge
in more detail.
Upvotes: 3