Reputation: 419
I am trying to filter the rows having strings and NaN values. The following code does not work
list=['ok','good','fine']
df_new = df[(df['b'].str.contains('|'.join(list))) | (df.b.isnull())]
Example
df = pd.DataFrame({'a':[1,2,np.nan, 4, 3 ], 'b':['ok',1,np.nan, np.nan,'fine' ]})
df
a b
0 1.0 ok
1 2.0 1
2 NaN NaN
3 4.0 NaN
4 3.0 fine
df[(df['b'].str.contains('|'.join(list))) | (df.b.isnull())]
a b
0 1.0 ok
4 3.0 fine
Desired Output
0 1.0 ok
2 NaN NaN
3 4.0 NaN
4 3.0 fine
I know there are many way to achieve this but wondering why this code is not working
Upvotes: 0
Views: 73
Reputation: 12251
Your problem is that df['b'].str.contains('|'.join(l))
produces NaN
s, since a comparison to NaN
here will yield a NaN
in the result. That is,
>>> df['b'].str.contains('|'.join(l))
0 True
1 NaN
2 NaN
3 NaN
4 True
If you logical-or(*) that with the df.b.isnull()
condition, the NaN
s will be interpreted as False
(or probably as NaN
even, but be equivalent to False
):
>>> df['b'].str.contains('|'.join(l)) | df.b.isnull()
0 True
1 False
2 False
3 False
4 True
Hence you lose a few rows.
If you reverse the order for the logical-or, shortcutting for comparisons will kick in: if sides are or-ed, and the left-hand side is True
, there is no need to look at the right-hand side (whether it be False
, True
or NaN
), and thus your rows remain:
>>> df.b.isnull() | df['b'].str.contains('|'.join(l))
0 True
1 False
2 True
3 True
4 True
Hence you'll get your result when you logical-or the sides in reverse order:
>>> df[df.b.isnull() | df.b.str.contains('|'.join(l))]
a b
0 1.0 ok
2 NaN NaN
3 4.0 NaN
4 3.0 fine
(You can remove some parentheses, as well as replace the ['b']
part.)
I don't think this is something to rely on, though. You would be better off, for example, by replacing the NaN
s with a more relevant value (perhaps an empty string), so that the boolean combination works both ways. Because at some point, a trick like the one above, will bite you.
df.b = df.b.fillna("")
replaces the NaN
s with an empty string.
(*) "logical-or" is not a verb that I know of, but for this context, I made it a verb. As some say: verbing weirds language.
Upvotes: 2