Reputation: 81
I'm wondering what the difference is when you merge by pd.merge
versus dataframe.merge()
, examples below:
pd.merge(dataframe1, dataframe2)
and
dataframe1.merge(dataframe2)
Upvotes: 8
Views: 4707
Reputation: 738
We've two functions at our disposal for almost the same task pandas.merge() and DataFrame.merge().
pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
left_index=False, right_index=False,
sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None,
left_index=False, right_index=False,
sort=False, suffixes='_x', '_y', copy=True, indicator=False, validate=None)
Both look similar, what's the advantage of using one over the other?
pd.merge() calls for df.merge, so df1.merge(df2) will give almost same results as pd.merge(df1, df2).
However, pd.merge() is wrapping style function and df1.merge() is chaining style, which makes the later easier to chain from left to right
E.g.,
df1.merge(df2).merge(df3)
#looks better and readable [analogus to %>% pipeline operator in R] than
pd.merge(pd.merge(df1, df2), df3).
d1 = pd.read_html('https://worldpopulationreview.com/countries')
pop = d1[0]
print(pop.info(), '\n') #Data for 232 countries for 7 columns
pop.head(3)
d2 = pd.read_html('https://worldpopulationreview.com/country-rankings/median-age')
age = d2[0]
print(age.info(), '\n') #Data for 221 countries for 5 columns
age.head(3)
display('pd.merge(): ', pd.merge(pop, age), 'df.merge(): ', pop.merge(age))
Upvotes: 11