tjb305
tjb305

Reputation: 2630

Does pandas != 'a value' return NaNs?

When I use x['test'] = df['a_variable'].str.contains('some string') I get-

True
NaN
NaN
True
NaN

If I use x[x['test'] != True] Should I receive back the rows with value NaN?

Thanks.

Upvotes: 0

Views: 173

Answers (2)

EdChum
EdChum

Reputation: 394399

Yes this is expected behaviour:

In [3]:
df = pd.DataFrame({'some_string':['asdsa','some',np.NaN, 'string']})
df

Out[3]:
  some_string
0       asdsa
1        some
2         NaN
3      string

In [4]:
df['some_string'].str.contains('some')

Out[4]:
0    False
1     True
2      NaN
3    False
Name: some_string, dtype: object

Using the above as a mask:

In [13]:
df[df['some_string'].str.contains('some') != False]

Out[13]:
  some_string
1        some
2         NaN

So the above is expected behaviour.

If you specify the value for NaN values using na=value then you can get whatever value you set as the returned value:

In [6]:
df['some_string'].str.contains('some', na=False)

Out[6]:
0    False
1     True
2    False
3    False
Name: some_string, dtype: bool

The above becomes important as indexing with NaN values will result in a KeyError.

Upvotes: 2

The6thSense
The6thSense

Reputation: 8335

Yes we would expect it to happen

ex.)

x=pd.DataFrame([True,NaN,True,NaN])
print x

    0
0   True
1   NaN
2   True
3   NaN

print x[x[0] != True]

    0
1   NaN
3   NaN

x[x[0] != True] would return every thing where the value is not True

Like wise

print x[x[0] != False]

    0
0   True
1   NaN
2   True
3   NaN

Since equation says to return all value which are not False

Upvotes: 1

Related Questions