user6315578
user6315578

Reputation:

Error:cannot convert float NaN to integer in pandas

I have the dataframe:

   a            b     c      d
0 nan           Y     nan   nan
1  1.27838e+06  N      3     96
2 nan           N      2    nan
3  284633       Y     nan    44

I try to change the data which is non zero to interger type to avoid exponential data(1.27838e+06):

f=lambda x : int(x)
df['a']=np.where(df['a']==None,np.nan,df['a'].apply(f))

But I get error also event thought I wish to change the dtype of not null value, anyone can point out my error? thanks

Upvotes: 6

Views: 16130

Answers (3)

victoremanuxll
victoremanuxll

Reputation: 16

You can use Int64 datatype to keep it simple, like:

Working example:

df = pd.DataFrame({"id": ["1", "2", "3", "4", np.NaN]}, columns=["id"]).astype('float64').astype('Int64')

Non working example (returns the same error as yours, int64 doesn't support NaNs):

df = pd.DataFrame({"id": ["1", "2", "3", "4", np.NaN]}, columns=["id"]).astype('float64').astype('int64')

The difference it's just that Int64 is more flexible. It is used to represent integers but can deal with NaNs, like in your case. So you can fill the values later if needed.

Reference: https://github.com/pandas-dev/pandas/issues/27731

Upvotes: 0

Ken Wei
Ken Wei

Reputation: 3130

Pandas doesn't have the ability to store NaN values for integers. Strictly speaking, you could have a column with mixed data types, but this can be computationally inefficient. So if you insist, you can do

df['a'] = df['a'].astype('O')
df.loc[df['a'].notnull(), 'a'] = df.loc[df['a'].notnull(), 'a'].astype(int)

Upvotes: 6

lmiguelvargasf
lmiguelvargasf

Reputation: 69725

As far as I have read in the pandas documentation, it is not possible to represent an integer NaN:

"In the absence of high performance NA support being built into NumPy from the ground up, the primary casualty is the ability to represent NAs in integer arrays."

As it is explained later, it is due to memory and performance reasons, and also so that the resulting Series continues to be “numeric”. One possibility is to use dtype=object arrays instead.

Upvotes: 1

Related Questions