Reputation: 198

Numpy turning ints to decimals

Consider the following code

import numpy as np
import pandas as pd
myDict = ({"Row 1": [10, np.nan],
           "Row 2": [10, "NaN"]})
myDf = pd.DataFrame(myDict)

This results in the following dataframe

Why does the use of np.nan turn the int to a decimal in the first column?

Upvotes: 2

Answers (4)

Whip

Reputation: 143

My guess, is that because you used the quotations in the second column, its treating the NaN as a string. As such, it assigned the Column 2 data type as an "object" instead of an integer or a float. Also np.nan is treated as a float number, so Column 1, which includes both integers and floats, got assigned the more detailed float type.

Using your code above, i ran the following code:

In[1]:
type(np.nan)
Out[1]:
float

In[2]:
type("NaN")
Out[2]:
str


In[3]:
myDf.info()

Out[3]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
Row 1    1 non-null float64
Row 2    2 non-null object
dtypes: float64(1), object(1)
memory usage: 112.0+ bytes

Upvotes: 0

Dante Filho

Reputation: 53

Try

import numpy as np
import pandas as pd
myDict = ({"Row 1": [10, np.nan],
       "Row 2": [10, "NaN"]})
myDf = pd.to_numeric(myDict, errors="coerce")

Upvotes: 0

piRSquared

Reputation: 294348

Pandas is dependent on Numpy for many things. Among those things is the null value np.nan. Numpy defines that value as dtype np.float. Pandas intends to store dataframe columns as single dimensional Numpy arrays. Numpy requires that all values be cast as the same dtype.

This would be fixed if Numpy had a null value for integers but it doesn't... yet.

When Pandas reads the dictionary and realizes that all values are numeric, it has two choices.

Cast the entire column as dtype object and retain the values [10, np.nan]
Cast the entire column as dtype float and augment the integer [10.0, np.nan]

Pandas chooses the second option because almost always people will be doing numeric calculations and and having float is optimized for such things while object is not.

On the other column, [10, "Nan"] has a string thing in there and Pandas doesn't attempt to make those into float and leaves them as object. It'll be up to you to fix it.

Upvotes: 4

A_kat

Reputation: 1537

myDict = ({"Row 1": [10.0, np.nan],
       "Row 2": [10.0, "NaN"]})

This should do the trick. If not you can change pandas dataframe value

myDf.apply(pd.to_numeric(errors="coerce"))

Coerce allows you to ignore nan values.

Upvotes: 1

Numpy turning ints to decimals

Answers (4)

Related Questions