Reputation: 198
Consider the following code
import numpy as np
import pandas as pd
myDict = ({"Row 1": [10, np.nan],
"Row 2": [10, "NaN"]})
myDf = pd.DataFrame(myDict)
This results in the following dataframe
Why does the use of np.nan turn the int to a decimal in the first column?
Upvotes: 2
Views: 634
Reputation: 143
My guess, is that because you used the quotations in the second column, its treating the NaN as a string. As such, it assigned the Column 2 data type as an "object" instead of an integer or a float. Also np.nan is treated as a float number, so Column 1, which includes both integers and floats, got assigned the more detailed float type.
Using your code above, i ran the following code:
In[1]:
type(np.nan)
Out[1]:
float
In[2]:
type("NaN")
Out[2]:
str
In[3]:
myDf.info()
Out[3]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
Row 1 1 non-null float64
Row 2 2 non-null object
dtypes: float64(1), object(1)
memory usage: 112.0+ bytes
Upvotes: 0
Reputation: 53
Try
import numpy as np
import pandas as pd
myDict = ({"Row 1": [10, np.nan],
"Row 2": [10, "NaN"]})
myDf = pd.to_numeric(myDict, errors="coerce")
Upvotes: 0
Reputation: 294348
Pandas is dependent on Numpy for many things. Among those things is the null value np.nan
. Numpy defines that value as dtype
np.float
. Pandas intends to store dataframe columns as single dimensional Numpy arrays. Numpy requires that all values be cast as the same dtype
.
This would be fixed if Numpy had a null value for integers but it doesn't... yet.
When Pandas reads the dictionary and realizes that all values are numeric, it has two choices.
dtype
object
and retain the values [10, np.nan]
dtype
float
and augment the integer [10.0, np.nan]
Pandas chooses the second option because almost always people will be doing numeric calculations and and having float
is optimized for such things while object
is not.
On the other column, [10, "Nan"]
has a string thing in there and Pandas doesn't attempt to make those into float
and leaves them as object
. It'll be up to you to fix it.
Upvotes: 4
Reputation: 1537
myDict = ({"Row 1": [10.0, np.nan],
"Row 2": [10.0, "NaN"]})
This should do the trick. If not you can change pandas dataframe value
myDf.apply(pd.to_numeric(errors="coerce"))
Coerce allows you to ignore nan values.
Upvotes: 1