How pandas merge works?

Question

I do not know how the code work. I got difference results for same method.

I have tried to run the code in Jupyter Notebook but do not know how.

I don't understand what difference between two types of key: ['foo', 'bar', 'baz', 'foo'] and ['A', 'B', 'C', 'D']?

Could anyone explain it ?

Thank you so much !

```https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

   df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
                      'value': [1, 2, 3, 5]})
```df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
                      'value': [5, 6, 7, 8]})
   df1
    lkey value
0   foo      1
1   bar      2
2   baz      3
3   foo      5
   df2
    rkey value
0   foo      5
1   bar      6
2   baz      7
3   foo      8

```df1.merge(df2, left_on='lkey', right_on='rkey')
  lkey  value_x rkey  value_y
0  foo        1  foo        5
1  foo        1  foo        8
2  foo        5  foo        5
3  foo        5  foo        8
4  bar        2  bar        6
5  baz        3  baz        7

```When I tried to create my code, like belows:


``` df3 = pd.DataFrame({'lkey': ['A', 'B', 'C', 'D'], 'value': [1, 2, 3, 5})    
df4 = pd.DataFrame({'rkey': ['A', 'B', 'C', 'D'], 'value': [5, 6, 7, 8})

df3
    lkey    value
0   A       1
1   B       2
2   C       3
3   D       5

df4

    rkey    value
0   A       5
1   B       6
2   C       7
3   D       8

pd.merge(df3, df4, left_on='lkey', right_on='rkey', how='inner')

    lkey    value_x rkey    value_y
0      A          1    A    5
1      B          2    B    6
2      C          3    C    7
3      D          5    D    8

grace · Accepted Answer

There are 4 types of merges:

Inner: keep only the rows that match in both df with the key you specify.
Outer: keep all the rows in both dataframes of the key you specify.
Right: in the code, the dataframe you specify in second place, will keep all the registries and pull only those that are common in the 1st dataframe.
Left: in the code, the dataframe you specify in first place, will keep all the registries and pull only those that are common in the second df.

This diagram is very clear:

How pandas merge works?

Answers (2)

Related Questions