Reputation: 2006
I know i can do like below if we are checking only two columns together.
df['flag'] = df['a_id'].isin(df['b_id'])
where df
is a data frame, and a_id
and b_id
are two columns of the data frame. It will return True
or False
value based on the match. But i need to compare multiple columns together.
For example: if there are a_id , a_region, a_ip, b_id, b_region and b_ip
columns. I want to compare like below,
a_key = df['a_id'] + df['a_region] + df['a_ip']
b_key = df['b_id'] + df['b_region] + df['b_ip']
df['flag'] = a_key.isin(b_key)
Somehow the above code is always returning False
value. The output should be like below,
First row flag will be True because there is a match.
a_key
becomes 2a10
this is match with last row of b_key
(2a10)
Upvotes: 6
Views: 8586
Reputation: 2022
You were going in the right direction, just use:
a_key = df['a_id'].astype(str) + df['a_region'] + df['a_ip'].astype(str)
b_key = df['b_id'].astype(str) + df['b_region'] + df['b_ip'].astype(str)
a_key.isin(b_key)
Mine is giving below results:
0 True
1 False
2 False
Upvotes: 9
Reputation: 227
You can use isin
with DataFrame
as value, but as per the docs:
If values is a DataFrame, then both the index and column labels must match
So this should work:
# Removing the prefixes from column names
df_a = df[['a_id', 'a_region', 'a_ip']].rename(columns=lambda x: x[2:])
df_b = df[['b_id', 'b_region', 'b_ip']].rename(columns=lambda x: x[2:])
# Find rows where all values are in the other
matched = df_a.isin(df_b).all(axis=1)
# Get actual rows with boolean indexing
df_a.loc[matched]
# ... or add boolean flag to dataframe
df['flag'] = matched
Upvotes: 3
Reputation: 18647
Here's one approach using DataFrame.merge
, pandas.concat
and testing for duplicated
values:
df_merged = df.merge(df,
left_on=['a_id', 'a_region', 'a_ip'],
right_on=['b_id', 'b_region', 'b_ip'],
suffixes=('', '_y'))
df['flag'] = pd.concat([df, df_merged[df.columns]]).duplicated(keep=False)[:len(df)].values
[out]
a_id a_region a_ip b_id b_region b_ip flag
0 2 a 10 3222222 sssss 22222 True
1 22222 bcccc 10000 43333 ddddd 11111 False
2 33333 acccc 120000 2 a 10 False
Upvotes: 2