Reputation: 904
I have 2 dataframes :
df:
portfolio symbol id var1 var2 var3
df1:
symbol sector market count
I want to add the columns sector and market from df1 to df. df1 has uniques values for symbol and hence a smaller dataframe than df which is the original dataframe.
I tried doing :
pd.merge(df,df1,on='symbol',how='outer')
But the output is extending rows than desired. Can anyone help as to what is missed out here.
Thanks
Upvotes: 1
Views: 78
Reputation: 904
My apologies, I didn't realise that outer join would also create rows for the second dataframe values if not available in the first dataframe. that is the reason why I was getting extra rows, to remove that I added df7 = df.dropna(subset=['symbol'])
Upvotes: 1
Reputation: 55
If you do an outer join, the amount of rows will be the amount of rows the longer column of the two (symbol column) has and thus the one from df. If you only want the amount of unique symbol values you should use an inner join.
Upvotes: 1
Reputation: 7594
Have you tried doing an inner join,
df.merge(df1, on='symbol', how='inner')
Upvotes: 2