Oscar
Oscar

Reputation: 457

Comparison of columns ignoring NaN

I have two columns like this:

    NaN      35078
  33515      33515
    NaN     123841
    NaN     124074
    NaN       5699
   5700       5703
    NaN       7506
    NaN       9142
   9100       9146
    NaN      10736
  14557      14557
    NaN     17229

What I want to do is to compare both columns and get another new column with True or False. The condition is that the values of each column for each row should be different, but without the NaN values.

I should get something like this:

False
False
False
False
False
 True
False
False
 True
False
False
False

What I have is df.loc[:, 'column1'] != df.loc[:, 'column2'], but the NaN values are not ignored. Thanks!

Upvotes: 2

Views: 773

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150805

If I understand correctly, (None < None) is False so straight comparison works:

df = pd.DataFrame({0: {0: None,  1: 33515.0,  2: None,  3: None,  
                       4: None,  5: 5700.0,  6: None,  7: None,  
                       8: 9100.0,  9: None,  10: 14557.0,  11: None}, 
                   1: {0: 35078,  1: 33515,  2: 123841,  3: 124074,  
                       4: 5699,  5: 5703,  6: 7506,  7: 9142,  8: 9146,  
                       9: 10736,  10: 14557,  11: 17229}})

df[0] < df[1]

yields:

0     False
1     False
2     False
3     False
4     False
5      True
6     False
7     False
8      True
9     False
10    False
11    False
dtype: bool

Upvotes: 0

jezrael
jezrael

Reputation: 863521

Chain another boolean mask with & for bitwise AND with Series.notna:

mask = (df['column1'] != df['column2']) & (df['column1'].notna())
print (mask)
0     False
1     False
2     False
3     False
4     False
5      True
6     False
7     False
8      True
9     False
10    False
11    False
dtype: bool

If need test both columns:

mask = (df['column1'] != df['column2']) & (df[['column1', 'column1']].notna().all(axis=1))

Upvotes: 5

Related Questions