Diehardwalnut
Diehardwalnut

Reputation: 198

Numpy turning ints to decimals

Consider the following code

import numpy as np
import pandas as pd
myDict = ({"Row 1": [10, np.nan],
           "Row 2": [10, "NaN"]})
myDf = pd.DataFrame(myDict)

This results in the following dataframe

enter image description here

Why does the use of np.nan turn the int to a decimal in the first column?

Upvotes: 2

Views: 634

Answers (4)

Whip
Whip

Reputation: 143

My guess, is that because you used the quotations in the second column, its treating the NaN as a string. As such, it assigned the Column 2 data type as an "object" instead of an integer or a float. Also np.nan is treated as a float number, so Column 1, which includes both integers and floats, got assigned the more detailed float type.

Using your code above, i ran the following code:

In[1]:
type(np.nan)
Out[1]:
float

In[2]:
type("NaN")
Out[2]:
str


In[3]:
myDf.info()

Out[3]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
Row 1    1 non-null float64
Row 2    2 non-null object
dtypes: float64(1), object(1)
memory usage: 112.0+ bytes 

Upvotes: 0

Dante Filho
Dante Filho

Reputation: 53

Try

import numpy as np
import pandas as pd
myDict = ({"Row 1": [10, np.nan],
       "Row 2": [10, "NaN"]})
myDf = pd.to_numeric(myDict, errors="coerce")

Upvotes: 0

piRSquared
piRSquared

Reputation: 294348

Pandas is dependent on Numpy for many things. Among those things is the null value np.nan. Numpy defines that value as dtype np.float. Pandas intends to store dataframe columns as single dimensional Numpy arrays. Numpy requires that all values be cast as the same dtype.

This would be fixed if Numpy had a null value for integers but it doesn't... yet.

When Pandas reads the dictionary and realizes that all values are numeric, it has two choices.

  1. Cast the entire column as dtype object and retain the values [10, np.nan]
  2. Cast the entire column as dtype float and augment the integer [10.0, np.nan]

Pandas chooses the second option because almost always people will be doing numeric calculations and and having float is optimized for such things while object is not.

On the other column, [10, "Nan"] has a string thing in there and Pandas doesn't attempt to make those into float and leaves them as object. It'll be up to you to fix it.

Upvotes: 4

A_kat
A_kat

Reputation: 1537

myDict = ({"Row 1": [10.0, np.nan],
       "Row 2": [10.0, "NaN"]})

This should do the trick. If not you can change pandas dataframe value

myDf.apply(pd.to_numeric(errors="coerce"))

Coerce allows you to ignore nan values.

Upvotes: 1

Related Questions