Reputation: 344
I have two pandas df as below:-
df1
Type season name qty
Fruit summer Mango 12
Fruit summer watermelon 23
Fruit summer blueberries 200
vegetable summer Peppers 24
df2
Availability season name city
YEs summer Mango Pune
Yes summer Peppers Mumbai
Yes summer Tomatoes Mumbai
I want to compare df2 column season and name with df1 and return matched rows with an extra column name called status contain (1 represents match,0 represents not match) in df1. In this case like below.
df1
Type season name qty status
Fruit summer Mango 12 1
Fruit summer watermelon 23 0
Fruit summer blueberries 200 0
vegetable summer Peppers 24 1
Upvotes: 0
Views: 68
Reputation: 4761
You can use .isin
in the following way:
df1["status"] = list(zip(df1.season, df1.name))
df1["status"] = df1["status"].isin(list(zip(df2.season, df2.name)))
Output
df1
Type season name qty status
0 Fruit summer Mango 12 True
1 Fruit summer watermelon 23 False
2 Fruit summer blueberries 200 False
3 vegetable summer Peppers 24 True
Performance (vs. @perl's answer)
data = {'Type': {0: 'Fruit', 1: 'Fruit', 2: 'Fruit', 3: 'vegetable'},
'season': {0: 'summer', 1: 'summer', 2: 'summer', 3: 'summer'},
'name': {0: 'Mango', 1: 'watermelon', 2: 'blueberries', 3: 'Peppers'},
'qty': {0: 12, 1: 23, 2: 200, 3: 24}}
#@perl's answer
%%timeit
df1 = pd.DataFrame(data)
df1.merge(
df2[['season', 'name']].assign(status=1),
how='left').fillna(0)
#5.44 ms ± 56.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#my answer
%%timeit
df1["status"] = list(zip(df1.season, df1.name))
df1["status"].isin(list(zip(df2.season, df2.name)))
#434 µs ± 4.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Old (and wrong) answer
You can use .isin
with .to_dict
:
cols = ['season', 'name']
df1['status'] = df1[cols].isin(df2[cols].to_dict('list')).all(1).astype('int')
Output
df1
Type season name qty status
0 Fruit summer Mango 12 1
1 Fruit summer watermelon 23 0
2 Fruit summer blueberries 200 0
3 vegetable summer Peppers 24 1
Upvotes: 0
Reputation: 9941
Here's another option using merge
with how='left'
:
df1.merge(
df2[['season', 'name']].assign(status=1),
how='left').fillna(0)
Output:
Type season name qty status
0 Fruit summer Mango 12 1.0
1 Fruit summer watermelon 23 0.0
2 Fruit summer blueberries 200 0.0
3 vegetable summer Peppers 24 1.0
Upvotes: 4