Hiep Xuan Dinh
Hiep Xuan Dinh

Reputation: 37

How pandas merge works?

I do not know how the code work. I got difference results for same method.

I have tried to run the code in Jupyter Notebook but do not know how.

I don't understand what difference between two types of key: ['foo', 'bar', 'baz', 'foo'] and ['A', 'B', 'C', 'D']?

Could anyone explain it ?

Thank you so much !

```https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

   df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
                      'value': [1, 2, 3, 5]})
```df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
                      'value': [5, 6, 7, 8]})
   df1
    lkey value
0   foo      1
1   bar      2
2   baz      3
3   foo      5
   df2
    rkey value
0   foo      5
1   bar      6
2   baz      7
3   foo      8

```df1.merge(df2, left_on='lkey', right_on='rkey')
  lkey  value_x rkey  value_y
0  foo        1  foo        5
1  foo        1  foo        8
2  foo        5  foo        5
3  foo        5  foo        8
4  bar        2  bar        6
5  baz        3  baz        7

```When I tried to create my code, like belows:


``` df3 = pd.DataFrame({'lkey': ['A', 'B', 'C', 'D'], 'value': [1, 2, 3, 5})    
df4 = pd.DataFrame({'rkey': ['A', 'B', 'C', 'D'], 'value': [5, 6, 7, 8})

df3
    lkey    value
0   A       1
1   B       2
2   C       3
3   D       5

df4

    rkey    value
0   A       5
1   B       6
2   C       7
3   D       8

pd.merge(df3, df4, left_on='lkey', right_on='rkey', how='inner')

    lkey    value_x rkey    value_y
0      A          1    A    5
1      B          2    B    6
2      C          3    C    7
3      D          5    D    8

Upvotes: 1

Views: 206

Answers (2)

grace
grace

Reputation: 72

There are 4 types of merges:

  • Inner: keep only the rows that match in both df with the key you specify.
  • Outer: keep all the rows in both dataframes of the key you specify.
  • Right: in the code, the dataframe you specify in second place, will keep all the registries and pull only those that are common in the 1st dataframe.
  • Left: in the code, the dataframe you specify in first place, will keep all the registries and pull only those that are common in the second df.

This diagram is very clear: enter image description here

Upvotes: 3

Elegant Code
Elegant Code

Reputation: 690

Merge (how='inner', which is also the default) creates a new row in the merged dataframe for every "key" in df1 that matches a "key" in df2.

Notice that in the first example, one of the "keys", "foo", shows up twice in both dataframes, that is what leads to four "combinations" of the "key "foo" in the merged dataframe in the first example.

However, in your code (the second example), you have one of each "key": ["A", "B", "C", "D"] so there is only one "match" for each row, and therefore it generates only one row in the merged dataframe.

Hope this helps. Let me know if you need further clarifications.

Upvotes: 0

Related Questions