scott martin
scott martin

Reputation: 1293

Pandas - Filtering row based on value

I have a Dataframe as below:

col1,col2,value1,value2
type_1,type_2,,NaN
type_3,type_4,NaN,
type_5,type_6,apples,oranges
type_7,type_8,apples,apples

I am trying to filter out the above dataframe in which value1 is not the same as value2

Expected output:

col1,col2,value1,value2
type_1,type_2,,NaN
type_3,type_4,NaN,
type_5,type_6,apples,oranges

I have done the below:

df = df.query("value1 != value2")

I am having an issue where the rows with null and NaN get missed

Upvotes: 1

Views: 70

Answers (3)

Andy L.
Andy L.

Reputation: 25239

Edit: Since @WenYoBen mentioned it although your sample doesn't show that case. If your data have NaN on either value1 or value2 and you don't want to drop it. Belows will only drop on rows both NaN. I change 1 NaN of your sample to pear to show this case

df.query("value1 != value2").dropna(subset=['value1', 'value2'], how='all')

Out[3056]:
     col1    col2  value1   value2
0  type_1  type_2     NaN     pear
2  type_5  type_6  apples  oranges

np.nan compares against itself always return False. In other words, np.nan != np.nan returns True, so it will include in your output. If you need to filter it out, you just need adding dropna to the output:

df.query("value1 != value2").dropna()

Out[3035]:
     col1    col2  value1   value2
2  type_5  type_6  apples  oranges

Upvotes: 2

BENY
BENY

Reputation: 323226

If you check with np.nan, you will find that NaN always not equal to NaN

np.nan==np.nan
Out[164]: False

So you may need using

df[df.fillna('NaN').eval('value1!=value2')]
Out[170]: 
     col1    col2  value1   value2
0  type_1  type_2              NaN
1  type_3  type_4     NaN         
2  type_5  type_6  apples  oranges

Upvotes: 0

Erfan
Erfan

Reputation: 42886

There are several ways which you can do this:

Using boolean indexing

I will use .ne here which stands for not equal

df[df['value1'].ne(df['value2'])]

     col1    col2  value1   value2
0  type_1  type_2     NaN      NaN
1  type_3  type_4     NaN      NaN
2  type_5  type_6  apples  oranges

.query

df.query('value1 != value2')

     col1    col2  value1   value2
0  type_1  type_2     NaN      NaN
1  type_3  type_4     NaN      NaN
2  type_5  type_6  apples  oranges

Upvotes: 2

Related Questions