Reputation: 358
I'm trying to filter data with multiple conditions using .isin
I've created a dataframe with data like this.
col_a col_b col_c
abc yes a
abc no b
abc yes a
def no b
def yes a
def no b
def yes a
def no b
ghi yes a
ghi no b
ghi yes a
When I try this type of filtering, referring to this solution I seen on stack overflow I get back all NaN values. Pandas: Filtering multiple conditions
How can I apply the three conditions to filter?
fil_1 = test.isin({'col_a': ['abc','def','ghi']})
fil_2 = test.isin({'col_b': ['yes']})
fil_3 = test.isin({'col_c' :['a']})
data = test[fil_1 & fil_2 & fil_3]
data
Upvotes: 0
Views: 300
Reputation: 30940
You need:
fil_1 = test['col_a'].isin(['abc','def','ghi'])
fil_2 = test['col_b'].isin(['yes'])
fil_3 = test['col_c'].isin(['a'])
or
test.isin({'col_a': ['abc','def','ghi'],
'col_b': ['yes'],
'col_c' :['a']}).all(axis = 1)
df_filtered = test[fil_1 & fil_2 & fil_3]
print(df_filtered)
col_a col_b col_c
0 abc yes a
2 abc yes a
4 def yes a
6 def yes a
8 ghi yes a
10 ghi yes a
or logic |
fil = test.isin({'col_a': ['abc','def','ghi'],'col_b': ['yes'],'col_c' :['a']})
df_filtered = df[fil]
print(df_filtered)
col_a col_b col_c
0 abc yes a
1 abc NaN NaN
2 abc yes a
3 def NaN NaN
4 def yes a
5 def NaN NaN
6 def yes a
7 def NaN NaN
8 ghi yes a
9 ghi NaN NaN
10 ghi yes a
Now if we also use DataFrame.all
:
df_filtered = df[fil.all(axis = 1)]
print(df_filtered)
col_a col_b col_c
0 abc yes a
2 abc yes a
4 def yes a
6 def yes a
8 ghi yes a
10 ghi yes a
Detail
print(fil)
col_a col_b col_c
0 True True True
1 True False False
2 True True True
3 True False False
4 True True True
5 True False False
6 True True True
7 True False False
8 True True True
9 True False False
10 True True True
print(test.isin({'col_a': ['abc','def','ghi']}))
col_a col_b col_c
0 True False False
1 True False False
2 True False False
3 True False False
4 True False False
5 True False False
6 True False False
7 True False False
8 True False False
9 True False False
10 True False False
this return False
in columns differences than col_a
so you got NaN
values because you were using &
Upvotes: 1
Reputation: 939
A possible solution to filter your dataframe is the following: "cond1" select all the values in col_a that are either "abc" or "def" or "ghi". Then col_b is only "yes", and col_c is only "a".
cond1=(apd.col_a=="abc") | (apd.col_a=="def") | (apd.col_a=="ghi")
apd[ cond1 & (apd.col_b=="yes") & (apd.col_c=="a")]
Result:
col_a col_b col_c
0 abc yes a
2 abc yes a
4 def yes a
6 def yes a
8 ghi yes a
10 ghi yes a
Upvotes: 0
Reputation: 3349
Here's the one-liner solution,
test[test.col_a.isin(['abc','def','ghi']) & test.col_b.isin(['yes']) & test.col_c.isin(['a'])]
Upvotes: 0