SantoshGupta7
SantoshGupta7

Reputation: 6197

Filter rows based on multiple column value, where the values can be in any column

Say that I have a dataframe with more than 2 columns

df = pd.DataFrame(np.random.randint(0,100,size=(5, 3)), columns=list('ABC'))
df

A   B   C
0   34  11  57
1   51  13  24
2   37  18  96
3   66  62  87
4   79  17  57

Say that I want to filter for rows where B and C contain 13 and 24, but it doesn't matter which column has which.

I can do something like this

crit1 = 13
crit2 = 24

df[

((
    df['B'] == crit1
) 
|
(
    df['C'] == crit1
))
&
((
    df['B'] == crit2
) 
|
(
    df['C'] == crit2
))

]
A   B   C
1   51  13  24

But I'm not sure if it's the most computationally efficient method, since each column is being checked twice.

My next idea would be to create a new column which a list of values from B and C, but I saw that pandas is not most efficient when handling lists in its columns.

Upvotes: 1

Views: 195

Answers (1)

Stuart Berg
Stuart Berg

Reputation: 18151

One option is to use the in operator within DataFrame.query().

In [1]: df
Out[1]:
    A   B   C
0  34  11  57
1  51  13  24
2  37  18  96
3  66  62  87
4  79  17  57

In [2]: crit = (13, 24)
   ...: df.query('B in @crit and C in @crit and B != C')
Out[2]:
    A   B   C
1  51  13  24

Upvotes: 3

Related Questions