Reputation: 37
I do not know how the code work. I got difference results for same method.
I have tried to run the code in Jupyter Notebook but do not know how.
I don't understand what difference between two types of key: ['foo', 'bar', 'baz', 'foo'] and ['A', 'B', 'C', 'D']?
Could anyone explain it ?
Thank you so much !
```https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
'value': [1, 2, 3, 5]})
```df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
'value': [5, 6, 7, 8]})
df1
lkey value
0 foo 1
1 bar 2
2 baz 3
3 foo 5
df2
rkey value
0 foo 5
1 bar 6
2 baz 7
3 foo 8
```df1.merge(df2, left_on='lkey', right_on='rkey')
lkey value_x rkey value_y
0 foo 1 foo 5
1 foo 1 foo 8
2 foo 5 foo 5
3 foo 5 foo 8
4 bar 2 bar 6
5 baz 3 baz 7
```When I tried to create my code, like belows:
``` df3 = pd.DataFrame({'lkey': ['A', 'B', 'C', 'D'], 'value': [1, 2, 3, 5})
df4 = pd.DataFrame({'rkey': ['A', 'B', 'C', 'D'], 'value': [5, 6, 7, 8})
df3
lkey value
0 A 1
1 B 2
2 C 3
3 D 5
df4
rkey value
0 A 5
1 B 6
2 C 7
3 D 8
pd.merge(df3, df4, left_on='lkey', right_on='rkey', how='inner')
lkey value_x rkey value_y
0 A 1 A 5
1 B 2 B 6
2 C 3 C 7
3 D 5 D 8
Upvotes: 1
Views: 206
Reputation: 72
There are 4 types of merges:
Upvotes: 3
Reputation: 690
Merge (how='inner', which is also the default) creates a new row in the merged dataframe for every "key" in df1 that matches a "key" in df2.
Notice that in the first example, one of the "keys", "foo", shows up twice in both dataframes, that is what leads to four "combinations" of the "key "foo" in the merged dataframe in the first example.
However, in your code (the second example), you have one of each "key": ["A", "B", "C", "D"] so there is only one "match" for each row, and therefore it generates only one row in the merged dataframe.
Hope this helps. Let me know if you need further clarifications.
Upvotes: 0