biwia
biwia

Reputation: 397

use isin inside groupby

I have two dataframes df1 is:

    uid   a   b
0   111   i1   2
1   111   i2   3
2   111   i4   5
3   111   i6   7
4   111   i5   8
5   222   i10   9
6   222   i11  12
7   222   i13  11
8   222   i2   1
9   333   i14  16
10  333   i3   2
11  333   i16  19
12  333   i3   4
13  444   i21  20
14  444   i9   5

and my df2:

    uid   a   
0   111   i1   
1   111   i2   
2   222   i4   
3   222   i11  
4   222   i13  
5   222   i2   

I want to check if the df2.a is in df1.a group by uid, if it exists then I want to create column "new" and put 1 in this column

My expected output is:

    uid   a   b.  new 
0   111   i1   2.  1
1   111   i2   3.  1
2   111   i4   5   0
3   111   i6   7.  0
4   111   i5   8.  0
5   222   i10   9. 0
6   222   i11  12. 1
7   222   i13  11. 1
8   222   i2   1.  0
9   333   i14  16. 0
10  333   i3   2.  0
11  333   i16  19. 0
12  333   i3   4.  0
13  444   i21  20. 0
14  444   i9   5.  0

Does anyone knows how to do it please?

Upvotes: 1

Views: 115

Answers (4)

BENY
BENY

Reputation: 323316

Let us do assign

out = df1.merge(df2.assign(new=1),how='left').fillna({'new':0})
Out[255]: 
    uid    a   b  new
0   111   i1   2  1.0
1   111   i2   3  1.0
2   111   i4   5  0.0
3   111   i6   7  0.0
4   111   i5   8  0.0
5   222  i10   9  0.0
6   222  i11  12  1.0
7   222  i13  11  1.0
8   222   i2   1  1.0
9   333  i14  16  0.0
10  333   i3   2  0.0
11  333  i16  19  0.0
12  333   i3   4  0.0
13  444  i21  20  0.0
14  444   i9   5  0.0

Upvotes: 3

ifly6
ifly6

Reputation: 5331

I would left merge with an indicator.

>>> df1.merge(df2, on=['uid', 'a'], how='left', indicator=True)

    uid    a   b     _merge
0   111   i1   2       both
1   111   i2   3       both
2   111   i4   5  left_only
3   111   i6   7  left_only
4   111   i5   8  left_only
5   222  i10   9  left_only
6   222  i11  12       both
7   222  i13  11       both
8   222   i2   1       both
9   333  i14  16  left_only
10  333   i3   2  left_only
11  333  i16  19  left_only
12  333   i3   4  left_only
13  444  i21  20  left_only
14  444   i9   5  left_only

Then convert the indicator column to your variable, renaming it df.rename(columns={'_merge', 'new'}, inplace=True) and then replacing elements in the column with df['_merge'].replace({'both': 1, 'left_only': 0}, inplace=True).

Upvotes: 0

Arkadiusz
Arkadiusz

Reputation: 1875

Use set_index(), isin() and display results as integers:

df1['new'] = (df1.set_index(['uid', 'a']).index
                 .isin(df2.set_index(['uid', 'a']).index)
                 .astype(int)
             )

Output:

    uid    a     b  new
0   111   i1     2    1
1   111   i2     3    1
2   111   i4     5    0
3   111   i6     7    0
4   111   i5     8    0
5   222  i10     9    0
6   222  i11    12    1
7   222  i13    11    1
8   222   i2     1    1
9   333  i14    16    0
10  333   i3     2    0
11  333  i16    19    0
12  333   i3     4    0
13  444  i21    20    0
14  444   i9     5    0

Upvotes: 0

Henry Ecker
Henry Ecker

Reputation: 35676

Try a left merge with indicator then replace the indicator with 1 or 0 based on values in both tables with something like np.where:

new_df = pd.merge(df1, df2, on=['uid', 'a'], how='left', indicator='new')
new_df['new'] = np.where(new_df['new'].eq('both'), 1, 0)

Option 2: get boolean values with eq + astype:

new_df = pd.merge(df1, df2, on=['uid', 'a'], how='left', indicator='new')
new_df['new'] = new_df['new'].eq('both').astype(int)

new_df:

    uid    a   b  new
0   111   i1   2    1
1   111   i2   3    1
2   111   i4   5    0
3   111   i6   7    0
4   111   i5   8    0
5   222  i10   9    0
6   222  i11  12    1
7   222  i13  11    1
8   222   i2   1    1
9   333  i14  16    0
10  333   i3   2    0
11  333  i16  19    0
12  333   i3   4    0
13  444  i21  20    0
14  444   i9   5    0

Upvotes: 1

Related Questions