Reputation: 1338
I have a dataframe to which I wan't to add a column with a kind of status if there is a matching value in another dataframe. I have the current code which works:
df1['NewColumn'] = df1['ComparisonColumn'].apply(lambda x: 'Match' if any(df2.ComparisonColumn == x) else ('' if x is None else 'Missing'))
I know the line is ugly, but I get the impression that its inefficient. Can you suggest a better way to make this comparison?
Upvotes: 1
Views: 266
Reputation: 153460
You can use np.where
, isin
, and isnull
:
Create some dummy data:
np.random.seed(123)
df = pd.DataFrame({'ComparisonColumn':np.random.randint(10,20,20)})
df.iloc[4] = np.nan #Create missing data
df2 = pd.DataFrame({'ComparisonColumn':np.random.randint(15,30,20)})
Do matching with np.where
:
df['NewColumn'] = np.where(df.ComparisonColumn.isin(df2.ComparisonColumn),'Matched',np.where(df.ComparisonColumn.isnull(),'Missing',''))
Output:
ComparisonColumn NewColumn
0 12.0
1 12.0
2 16.0 Matched
3 11.0
4 NaN Missing
5 19.0 Matched
6 16.0 Matched
7 11.0
8 10.0
9 11.0
10 19.0 Matched
11 10.0
12 10.0
13 19.0 Matched
14 13.0
15 14.0
16 10.0
17 10.0
18 14.0
19 11.0
Upvotes: 1