Reputation: 15
I am trying to populate a dataframe with the following code:
df = pd.DataFrame(data=np.random.choice([1, np.nan], size=5))
0 1
1 1
2 NaN
3 1
4 1
Then:
df[df[0].isnull()]
2 NaN
So far, so good. But if I am modifying the 1 to '1' things get strange (imo).
df = pd.DataFrame(data=np.random.choice(['1', np.nan], size=5))
0 1
1 1
2 1
3 1
4 nan
Problems come with the isnull
df[df[0].isnull()]
Empty DataFrame
Columns: [0]
Index: []
How can I get the nan (which is a string) to behave like a NaN? I want to be able to filter quickly on all null/non-null values within my dataframe.
Thanks.
Upvotes: 0
Views: 776
Reputation: 14801
NaN
is a concept which makes sense while working with numbers, not strings. When you create the dataframe with '1'
s Pandas is inferring the type of that column: str
, which IMO is correct. So it will then convert NaN
values to their string representation.
Note that if, for example, you say:
df = pd.DataFrame(data=np.random.choice(['1', 2], size=5))
The 2
will be converted as well to strings. Because, again, Pandas is inferring the string type for the whole column.
However, you can still filter easily with your proposed dataframe:
df = pd.DataFrame(data=np.random.choice(['1', np.nan], size=5))
df[df[0] == 'nan']
Upvotes: 1