Extratoro
Extratoro

Reputation: 15

Numpy NaN format in array not considered null

I am trying to populate a dataframe with the following code:

df = pd.DataFrame(data=np.random.choice([1, np.nan], size=5))


0     1  
1     1  
2   NaN  
3     1  
4     1   

Then:

df[df[0].isnull()]

2   NaN

So far, so good. But if I am modifying the 1 to '1' things get strange (imo).

df = pd.DataFrame(data=np.random.choice(['1', np.nan], size=5))

0    1  
1    1  
2    1  
3    1  
4  nan  

Problems come with the isnull

df[df[0].isnull()]

Empty DataFrame  
Columns: [0]  
Index: []

How can I get the nan (which is a string) to behave like a NaN? I want to be able to filter quickly on all null/non-null values within my dataframe.

Thanks.

Upvotes: 0

Views: 776

Answers (1)

Peque
Peque

Reputation: 14801

NaN is a concept which makes sense while working with numbers, not strings. When you create the dataframe with '1's Pandas is inferring the type of that column: str, which IMO is correct. So it will then convert NaN values to their string representation.

Note that if, for example, you say:

df = pd.DataFrame(data=np.random.choice(['1', 2], size=5))

The 2 will be converted as well to strings. Because, again, Pandas is inferring the string type for the whole column.

However, you can still filter easily with your proposed dataframe:

df = pd.DataFrame(data=np.random.choice(['1', np.nan], size=5))
df[df[0] == 'nan']

Upvotes: 1

Related Questions