Reputation: 6197
Say that I have a dataframe with more than 2 columns
df = pd.DataFrame(np.random.randint(0,100,size=(5, 3)), columns=list('ABC'))
df
A B C
0 34 11 57
1 51 13 24
2 37 18 96
3 66 62 87
4 79 17 57
Say that I want to filter for rows where B and C contain 13 and 24, but it doesn't matter which column has which.
I can do something like this
crit1 = 13
crit2 = 24
df[
((
df['B'] == crit1
)
|
(
df['C'] == crit1
))
&
((
df['B'] == crit2
)
|
(
df['C'] == crit2
))
]
A B C
1 51 13 24
But I'm not sure if it's the most computationally efficient method, since each column is being checked twice.
My next idea would be to create a new column which a list of values from B and C, but I saw that pandas is not most efficient when handling lists in its columns.
Upvotes: 1
Views: 195
Reputation: 18151
One option is to use the in
operator within DataFrame.query()
.
In [1]: df
Out[1]:
A B C
0 34 11 57
1 51 13 24
2 37 18 96
3 66 62 87
4 79 17 57
In [2]: crit = (13, 24)
...: df.query('B in @crit and C in @crit and B != C')
Out[2]:
A B C
1 51 13 24
Upvotes: 3