James Owers
James Owers

Reputation: 8335

Select rows from a pandas dataframe where two columns match list of pairs

I'm trying to create a boolean mask (or list of indices) from a dataframe to indicate where multiple columns match some combinations in a list. Here's an example:

import pandas as pd
df = pd.DataFrame({'A': ['alice', 'bob'    , 'charlie'  , 'dave' , 'dave'], 
                   'B': ['andy' , 'bridget', 'charlotte', 'diana', 'andy'], 
                   'C': ['some' , 'other'  , 'stuff'    , 'here' , '!' ]})

pairs = pd.DataFrame({'A': ['alice', 'dave'],
                      'B': ['andy' , 'diana']})

My desired output is either

[True, False, False, True, False]

or

[0, 3]

Critically, I don't want to return row index 4 i.e. ['dave', 'andy', '!']. I can achieve what I want by converting back to lists...but this feels like a long way round and I imagine there's a 'pandas' way to do this!

df_list = df[['A', 'B']].values.tolist()
pairs_list = pairs.values.tolist()
[idx for idx, row in enumerate(df_list) if row in pairs_list]

Upvotes: 8

Views: 5042

Answers (1)

EdChum
EdChum

Reputation: 393963

You could perform a outer type merge with indicator=True param and test whether _merge column == 'both':

In [97]:
merged = df.merge(pairs, how='outer', indicator=True)
merged[merged['_merge'] =='both'].index

Out[97]:
Int64Index([0, 3], dtype='int64')

To get a boolean Series:

In [98]:
merged['_merge'] =='both'

Out[98]:
0     True
1    False
2    False
3     True
4    False
Name: _merge, dtype: bool

the merged df looks like this:

In [99]:
merged

Out[99]:
         A          B      C     _merge
0    alice       andy   some       both
1      bob    bridget  other  left_only
2  charlie  charlotte  stuff  left_only
3     dave      diana   here       both
4     dave       andy      !  left_only

Upvotes: 11

Related Questions