24hourbreakfast
24hourbreakfast

Reputation: 185

why is to_numeric() converting str to float instead of int?

Noob here.

I have a pandas dataframe, and I'm trying to convert a column of numbers from a string type to an integer. But when I use to_numeric(), it converts to a float instead.

I'm using Jupyter Notebook.

citydata.tcad_id

results in

0      0206180115

2      0125050304

3      0225050137

4      0124000601

         ...    
995    0250300107

996    0217230301

997    0203030703

998    0135070323

999    0204160717

Name: tcad_id, Length: 1000, dtype: object

And

type(citydata.tcad_id[0])

shows the first (and subsequent) entries are...

str

So I tried

pd.to_numeric(citydata.tcad_id, downcast='integer', errors='coerce')

But that results in

0      206180115.0

1      419120319.0

2      125050304.0

3      225050137.0

4      124000601.0

       ...     

995    250300107.0

996    217230301.0

997    203030703.0

998    135070323.0

999    204160717.0

Name: tcad_id, Length: 1000, dtype: float64

I need them to be integers so I can compare against another list of integers.

HALP!

Upvotes: 2

Views: 1818

Answers (2)

GermanK
GermanK

Reputation: 1686

Probably too late, but are there "nan" or infinites in your data? This was the issue in my case. You can try doing:

pd.to_numeric(citydata.tcad_id.replace([np.inf, -np.inf], np.nan).dropna(), 
downcast='integer', errors='coerce')

Upvotes: 1

NotAName
NotAName

Reputation: 4322

If you have a look at the docs here you'll see the following:

The default return dtype is float64 or int64 depending on the data supplied. Use the downcast parameter to obtain other dtypes.

So it seems like pandas has decided to cast your data into float64. Use downcast:'integer' to get integer values.

Upvotes: 2

Related Questions