vishwajeet Mane
vishwajeet Mane

Reputation: 344

how to extract pandas dataframe from another dataframe based on multiple column?

I have two pandas df as below:-

df1

Type      season    name        qty
Fruit     summer    Mango        12
Fruit     summer    watermelon   23
Fruit     summer    blueberries  200
vegetable summer    Peppers      24


df2

Availability       season          name      city
  YEs              summer          Mango     Pune
  Yes              summer          Peppers   Mumbai
  Yes              summer          Tomatoes  Mumbai    

I want to compare df2 column season and name with df1 and return matched rows with an extra column name called status contain (1 represents match,0 represents not match) in df1. In this case like below.

df1
Type       season    name        qty   status
Fruit      summer    Mango        12     1
Fruit      summer    watermelon   23     0
Fruit      summer    blueberries  200    0
vegetable  summer    Peppers      24     1

Upvotes: 0

Views: 68

Answers (2)

Pablo C
Pablo C

Reputation: 4761

You can use .isin in the following way:

df1["status"] = list(zip(df1.season, df1.name))
df1["status"] = df1["status"].isin(list(zip(df2.season, df2.name)))

Output

df1
        Type  season         name  qty  status
0      Fruit  summer        Mango   12    True
1      Fruit  summer   watermelon   23   False
2      Fruit  summer  blueberries  200   False
3  vegetable  summer      Peppers   24    True

Performance (vs. @perl's answer)

data = {'Type': {0: 'Fruit', 1: 'Fruit', 2: 'Fruit', 3: 'vegetable'},
 'season': {0: 'summer', 1: 'summer', 2: 'summer', 3: 'summer'},
 'name': {0: 'Mango', 1: 'watermelon', 2: 'blueberries', 3: 'Peppers'},
 'qty': {0: 12, 1: 23, 2: 200, 3: 24}}

#@perl's answer
%%timeit 
df1 = pd.DataFrame(data) 
df1.merge( 
     df2[['season', 'name']].assign(status=1), 
     how='left').fillna(0)
                                                                       
#5.44 ms ± 56.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#my answer
%%timeit
df1["status"] = list(zip(df1.season, df1.name))
df1["status"].isin(list(zip(df2.season, df2.name)))

#434 µs ± 4.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Old (and wrong) answer

You can use .isin with .to_dict:

cols = ['season', 'name']
df1['status'] = df1[cols].isin(df2[cols].to_dict('list')).all(1).astype('int')

Output

df1
        Type  season         name  qty  status
0      Fruit  summer        Mango   12       1
1      Fruit  summer   watermelon   23       0
2      Fruit  summer  blueberries  200       0
3  vegetable  summer      Peppers   24       1

Upvotes: 0

perl
perl

Reputation: 9941

Here's another option using merge with how='left':

df1.merge(
    df2[['season', 'name']].assign(status=1),
    how='left').fillna(0)

Output:

        Type  season         name  qty  status
0      Fruit  summer        Mango   12     1.0
1      Fruit  summer   watermelon   23     0.0
2      Fruit  summer  blueberries  200     0.0
3  vegetable  summer      Peppers   24     1.0

Upvotes: 4

Related Questions