Reputation:
I have the dataframe:
a b c d
0 nan Y nan nan
1 1.27838e+06 N 3 96
2 nan N 2 nan
3 284633 Y nan 44
I try to change the data which is non zero to interger type to avoid exponential data(1.27838e+06):
f=lambda x : int(x)
df['a']=np.where(df['a']==None,np.nan,df['a'].apply(f))
But I get error also event thought I wish to change the dtype of not null value, anyone can point out my error? thanks
Upvotes: 6
Views: 16130
Reputation: 16
You can use Int64 datatype to keep it simple, like:
Working example:
df = pd.DataFrame({"id": ["1", "2", "3", "4", np.NaN]}, columns=["id"]).astype('float64').astype('Int64')
Non working example (returns the same error as yours, int64 doesn't support NaNs):
df = pd.DataFrame({"id": ["1", "2", "3", "4", np.NaN]}, columns=["id"]).astype('float64').astype('int64')
The difference it's just that Int64 is more flexible. It is used to represent integers but can deal with NaNs, like in your case. So you can fill the values later if needed.
Reference: https://github.com/pandas-dev/pandas/issues/27731
Upvotes: 0
Reputation: 3130
Pandas doesn't have the ability to store NaN values for integers. Strictly speaking, you could have a column with mixed data types, but this can be computationally inefficient. So if you insist, you can do
df['a'] = df['a'].astype('O')
df.loc[df['a'].notnull(), 'a'] = df.loc[df['a'].notnull(), 'a'].astype(int)
Upvotes: 6
Reputation: 69725
As far as I have read in the pandas documentation, it is not possible to represent an integer NaN
:
"In the absence of high performance NA support being built into NumPy from the ground up, the primary casualty is the ability to represent NAs in integer arrays."
As it is explained later, it is due to memory and performance reasons, and also so that the resulting Series continues to be “numeric”. One possibility is to use dtype=object
arrays instead.
Upvotes: 1