Reputation: 129
df1 = pd.DataFrame({'name': ['AUD','CAD', 'SMI','Joy', 'SHA', 'CHY', 'AUD', 'KRL'],
'sal': [200,300,600,500,300,200,100,350]})
df2 = pd.DataFrame({'name': ['SMI','Joy', 'SHA', 'CHY'],
'sal': [600,500,300,200]})
I want to compare above two dataframes by 'name' column and if values of df2 'name' is not available, then df1 'name' replace to 'Others'. My expected output as below
new_df = pd.DataFrame({'name': ['Others','Others' ,'SMI','Joy', 'SHA', 'CHY','Others','Others'],
'sal': [200,300,600,500,300,200,100,350]})
Upvotes: 2
Views: 1113
Reputation: 12018
You should use the pandas where
method. Essentially, this performs a substitution based on a boolean mapper performed on an iterator:
df1['new_name'] = df1['name'].where(
df1['name'].isin(df2['name'],
df1['name'],
'Others')
To break this down, you can create a boolean column and map it yourself:
map = df1['name'].isin(df2['name'])
df1['name'].where(map, 'Others')
Upvotes: 0
Reputation: 34046
One-liner using just np.where
:
In [1164]: df1.name = np.where(df1['name'].isin(df2['name']), df1.name, 'Others')
OR:
You can use df.merge
with numpy.where
:
In [1152]: import numpy as np
In [1153]: res = df2.merge(df1, on='name', how='right')
In [1155]: res.name = np.where(res.sal_x.isna(), 'Others', res.name)
In [1159]: res = res.drop('sal_x', 1).rename(columns={'sal_y':'sal'})
In [1160]: res
Out[1160]:
name sal
0 Others 200
1 Others 100
2 Others 300
3 SMI 600
4 Joy 500
5 SHA 300
6 CHY 200
7 Others 350
Upvotes: 1
Reputation: 150735
Let's try where
and isin
:
df1['name'] = df1['name'].where(df1['name'].isin(df2['name']), 'Others')
Output (df1
):
name sal
0 Others 200
1 Others 300
2 SMI 600
3 Joy 500
4 SHA 300
5 CHY 200
6 Others 100
7 Others 350
Upvotes: 1