Reputation: 2630
When I use
x['test'] = df['a_variable'].str.contains('some string')
I get-
True
NaN
NaN
True
NaN
If I use
x[x['test'] != True]
Should I receive back the rows with value NaN?
Thanks.
Upvotes: 0
Views: 173
Reputation: 394399
Yes this is expected behaviour:
In [3]:
df = pd.DataFrame({'some_string':['asdsa','some',np.NaN, 'string']})
df
Out[3]:
some_string
0 asdsa
1 some
2 NaN
3 string
In [4]:
df['some_string'].str.contains('some')
Out[4]:
0 False
1 True
2 NaN
3 False
Name: some_string, dtype: object
Using the above as a mask:
In [13]:
df[df['some_string'].str.contains('some') != False]
Out[13]:
some_string
1 some
2 NaN
So the above is expected behaviour.
If you specify the value for NaN
values using na=value
then you can get whatever value you set as the returned value:
In [6]:
df['some_string'].str.contains('some', na=False)
Out[6]:
0 False
1 True
2 False
3 False
Name: some_string, dtype: bool
The above becomes important as indexing with NaN
values will result in a KeyError
.
Upvotes: 2
Reputation: 8335
Yes we would expect it to happen
ex.)
x=pd.DataFrame([True,NaN,True,NaN])
print x
0
0 True
1 NaN
2 True
3 NaN
print x[x[0] != True]
0
1 NaN
3 NaN
x[x[0] != True]
would return every thing where the value is not True
Like wise
print x[x[0] != False]
0
0 True
1 NaN
2 True
3 NaN
Since equation says to return all value which are not False
Upvotes: 1