Reputation: 397
I have two dataframes df1
is:
uid a b
0 111 i1 2
1 111 i2 3
2 111 i4 5
3 111 i6 7
4 111 i5 8
5 222 i10 9
6 222 i11 12
7 222 i13 11
8 222 i2 1
9 333 i14 16
10 333 i3 2
11 333 i16 19
12 333 i3 4
13 444 i21 20
14 444 i9 5
and my df2:
uid a
0 111 i1
1 111 i2
2 222 i4
3 222 i11
4 222 i13
5 222 i2
I want to check if the df2.a
is in df1.a
group by uid
, if it exists then I want to create column "new"
and put 1 in this column
My expected output is:
uid a b. new
0 111 i1 2. 1
1 111 i2 3. 1
2 111 i4 5 0
3 111 i6 7. 0
4 111 i5 8. 0
5 222 i10 9. 0
6 222 i11 12. 1
7 222 i13 11. 1
8 222 i2 1. 0
9 333 i14 16. 0
10 333 i3 2. 0
11 333 i16 19. 0
12 333 i3 4. 0
13 444 i21 20. 0
14 444 i9 5. 0
Does anyone knows how to do it please?
Upvotes: 1
Views: 115
Reputation: 323316
Let us do assign
out = df1.merge(df2.assign(new=1),how='left').fillna({'new':0})
Out[255]:
uid a b new
0 111 i1 2 1.0
1 111 i2 3 1.0
2 111 i4 5 0.0
3 111 i6 7 0.0
4 111 i5 8 0.0
5 222 i10 9 0.0
6 222 i11 12 1.0
7 222 i13 11 1.0
8 222 i2 1 1.0
9 333 i14 16 0.0
10 333 i3 2 0.0
11 333 i16 19 0.0
12 333 i3 4 0.0
13 444 i21 20 0.0
14 444 i9 5 0.0
Upvotes: 3
Reputation: 5331
I would left merge with an indicator.
>>> df1.merge(df2, on=['uid', 'a'], how='left', indicator=True)
uid a b _merge
0 111 i1 2 both
1 111 i2 3 both
2 111 i4 5 left_only
3 111 i6 7 left_only
4 111 i5 8 left_only
5 222 i10 9 left_only
6 222 i11 12 both
7 222 i13 11 both
8 222 i2 1 both
9 333 i14 16 left_only
10 333 i3 2 left_only
11 333 i16 19 left_only
12 333 i3 4 left_only
13 444 i21 20 left_only
14 444 i9 5 left_only
Then convert the indicator column to your variable, renaming it df.rename(columns={'_merge', 'new'}, inplace=True)
and then replacing elements in the column with df['_merge'].replace({'both': 1, 'left_only': 0}, inplace=True)
.
Upvotes: 0
Reputation: 1875
Use set_index()
, isin()
and display results as integers:
df1['new'] = (df1.set_index(['uid', 'a']).index
.isin(df2.set_index(['uid', 'a']).index)
.astype(int)
)
Output:
uid a b new
0 111 i1 2 1
1 111 i2 3 1
2 111 i4 5 0
3 111 i6 7 0
4 111 i5 8 0
5 222 i10 9 0
6 222 i11 12 1
7 222 i13 11 1
8 222 i2 1 1
9 333 i14 16 0
10 333 i3 2 0
11 333 i16 19 0
12 333 i3 4 0
13 444 i21 20 0
14 444 i9 5 0
Upvotes: 0
Reputation: 35676
Try a left merge
with indicator then replace the indicator with 1
or 0
based on values in both tables with something like np.where
:
new_df = pd.merge(df1, df2, on=['uid', 'a'], how='left', indicator='new')
new_df['new'] = np.where(new_df['new'].eq('both'), 1, 0)
Option 2: get boolean values with eq
+ astype
:
new_df = pd.merge(df1, df2, on=['uid', 'a'], how='left', indicator='new')
new_df['new'] = new_df['new'].eq('both').astype(int)
new_df
:
uid a b new
0 111 i1 2 1
1 111 i2 3 1
2 111 i4 5 0
3 111 i6 7 0
4 111 i5 8 0
5 222 i10 9 0
6 222 i11 12 1
7 222 i13 11 1
8 222 i2 1 1
9 333 i14 16 0
10 333 i3 2 0
11 333 i16 19 0
12 333 i3 4 0
13 444 i21 20 0
14 444 i9 5 0
Upvotes: 1