Reputation: 8335
I'm trying to create a boolean mask (or list of indices) from a dataframe to indicate where multiple columns match some combinations in a list. Here's an example:
import pandas as pd
df = pd.DataFrame({'A': ['alice', 'bob' , 'charlie' , 'dave' , 'dave'],
'B': ['andy' , 'bridget', 'charlotte', 'diana', 'andy'],
'C': ['some' , 'other' , 'stuff' , 'here' , '!' ]})
pairs = pd.DataFrame({'A': ['alice', 'dave'],
'B': ['andy' , 'diana']})
My desired output is either
[True, False, False, True, False]
or
[0, 3]
Critically, I don't want to return row index 4 i.e. ['dave', 'andy', '!']
. I can achieve what I want by converting back to lists...but this feels like a long way round and I imagine there's a 'pandas' way to do this!
df_list = df[['A', 'B']].values.tolist()
pairs_list = pairs.values.tolist()
[idx for idx, row in enumerate(df_list) if row in pairs_list]
Upvotes: 8
Views: 5042
Reputation: 393963
You could perform a outer
type merge with indicator=True
param and test whether _merge
column == 'both'
:
In [97]:
merged = df.merge(pairs, how='outer', indicator=True)
merged[merged['_merge'] =='both'].index
Out[97]:
Int64Index([0, 3], dtype='int64')
To get a boolean Series
:
In [98]:
merged['_merge'] =='both'
Out[98]:
0 True
1 False
2 False
3 True
4 False
Name: _merge, dtype: bool
the merged df looks like this:
In [99]:
merged
Out[99]:
A B C _merge
0 alice andy some both
1 bob bridget other left_only
2 charlie charlotte stuff left_only
3 dave diana here both
4 dave andy ! left_only
Upvotes: 11