Reputation: 1293
I have a Dataframe as below:
col1,col2,value1,value2
type_1,type_2,,NaN
type_3,type_4,NaN,
type_5,type_6,apples,oranges
type_7,type_8,apples,apples
I am trying to filter out the above dataframe in which value1 is not the same as value2
Expected output:
col1,col2,value1,value2
type_1,type_2,,NaN
type_3,type_4,NaN,
type_5,type_6,apples,oranges
I have done the below:
df = df.query("value1 != value2")
I am having an issue where the rows with null and NaN get missed
Upvotes: 1
Views: 70
Reputation: 25239
Edit: Since @WenYoBen mentioned it although your sample doesn't show that case. If your data have NaN
on either value1
or value2
and you don't want to drop it. Belows will only drop on rows both NaN
. I change 1 NaN
of your sample to pear
to show this case
df.query("value1 != value2").dropna(subset=['value1', 'value2'], how='all')
Out[3056]:
col1 col2 value1 value2
0 type_1 type_2 NaN pear
2 type_5 type_6 apples oranges
np.nan
compares against itself always return False
. In other words, np.nan != np.nan
returns True
, so it will include in your output. If you need to filter it out, you just need adding dropna
to the output:
df.query("value1 != value2").dropna()
Out[3035]:
col1 col2 value1 value2
2 type_5 type_6 apples oranges
Upvotes: 2
Reputation: 323226
If you check with np.nan
, you will find that NaN
always not equal to NaN
np.nan==np.nan
Out[164]: False
So you may need using
df[df.fillna('NaN').eval('value1!=value2')]
Out[170]:
col1 col2 value1 value2
0 type_1 type_2 NaN
1 type_3 type_4 NaN
2 type_5 type_6 apples oranges
Upvotes: 0
Reputation: 42886
There are several ways which you can do this:
boolean indexing
I will use .ne
here which stands for not equal
df[df['value1'].ne(df['value2'])]
col1 col2 value1 value2
0 type_1 type_2 NaN NaN
1 type_3 type_4 NaN NaN
2 type_5 type_6 apples oranges
.query
df.query('value1 != value2')
col1 col2 value1 value2
0 type_1 type_2 NaN NaN
1 type_3 type_4 NaN NaN
2 type_5 type_6 apples oranges
Upvotes: 2