jss367
jss367

Reputation: 5401

Unable to remove NaN from panda Series

I know this question has been asked many times before, but all the solutions I have found don't seem to be working for me. I am unable to remove the NaN values from my pandas Series or DataFrame.

First, I tried removing directly from the DataFrame like in I/O 7 and 8 in the documentation (http://pandas.pydata.org/pandas-docs/stable/missing_data.html)

In[1]:
df['salary'][:5]
Out[1]:
0    365788
1    267102
2    170941
3       NaN
4    243293

In [2]:
pd.isnull(df['salary'][:5])
Out[2]:
0    False
1    False
2    False
3    False
4    False

I was expecting line 3 to show up as True, but it didn't. I removed the Series from the DataFrame to try it again.

sal = df['salary'][:5]

In [100]:
type(sals)
Out[100]:
pandas.core.series.Series

In [101]:    
sal.isnull()
Out[101]:
0    False
1    False
2    False
3    False
4    False
Name: salary, dtype: bool

In [102]:    
sal.dropna()
Out[102]:
0    365788
1    267102
2    170941
3       NaN
4    243293
Name: salary, dtype: object

Can someone tell me what I'm doing wrong? I am using IPython Notebook 2.2.0.

Upvotes: 2

Views: 581

Answers (1)

jakevdp
jakevdp

Reputation: 86443

The datatype of your column is object, which tells me it probably contains strings rather than numerical values. Try converting to float:

>>> sa1 = pd.Series(["365788", "267102", "170941", "NaN", "243293"])
>>> sa1
0    365788
1    267102
2    170941
3       NaN
4    243293
dtype: object

>>> sa1.isnull()
0    False
1    False
2    False
3    False
4    False
dtype: bool

>>> sa1 = sa1.astype(float)
>>> sa1.isnull()
0    False
1    False
2    False
3     True
4    False
dtype: bool

Upvotes: 4

Related Questions