theStud54
theStud54

Reputation: 695

Pandas Left Merge / Join not Resulting in what is expected for left join

So I may simply not be very informed on what a left join is, because I am getting tripped up... Here is my definition of a left join:

Includes matched records from the left and right tables and unmatched records from the LEFT table in the output table.

Here is my example:

In[87]: df1 = DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'b'], 'data1': range(6)})

In[88]: df2 = DataFrame({'key': ['a', 'b', 'a', 'b', 'd'], 'data2': range(5)})

In[89]: pd.merge(df1, df2, on='key', how='left')

Out[86]: 
    data1 key  data2
0       0   b      1
1       0   b      3
2       1   b      1
3       1   b      3
4       2   a      0
5       2   a      2
6       3   c    NaN
7       4   a      0
8       4   a      2
9       5   b      1
10      5   b      3

HOWEVER!!! I expect to get this:

    data1 key  data2
0       0   b      1
1       1   b      1
2       2   a      0
3       3   c      NaN
4       4   a      0
5       5   b      1

My general thinking comes from Transactional data (such as accounting documents where I may be merging header and item detail, or merging lookup data).

What am I missing in my thinking or code to make this work?

PS - This comes from working through Wes McKinney's Python for Data Analysis book (page 179) - where he mentions the following:

Many-to-many merges have well-defined though not necessarily intuitive behavior. Many-to-many joins form the Cartesian product of the rows. Since there were 3 'b' rows in the left DataFrame and 2 in the right one, there are 6 'b' rows in the result.

I guess I am missing the point here?

Upvotes: 3

Views: 3483

Answers (1)

Fabio Lamanna
Fabio Lamanna

Reputation: 21574

A way to get your expected output is to groupby by data1 and get the first value of each group:

g = df.groupby('data1').first().reset_index()

That returns:

   data1 key  data2
0      0   b      1
1      1   b      1
2      2   a      0
3      3   c    NaN
4      4   a      0
5      5   b      1

Hope that helps.

Upvotes: 2

Related Questions