Reputation: 185
Noob here.
I have a pandas dataframe, and I'm trying to convert a column of numbers from a string type to an integer. But when I use to_numeric(), it converts to a float instead.
I'm using Jupyter Notebook.
citydata.tcad_id
results in
0 0206180115
2 0125050304
3 0225050137
4 0124000601
...
995 0250300107
996 0217230301
997 0203030703
998 0135070323
999 0204160717
Name: tcad_id, Length: 1000, dtype: object
And
type(citydata.tcad_id[0])
shows the first (and subsequent) entries are...
str
So I tried
pd.to_numeric(citydata.tcad_id, downcast='integer', errors='coerce')
But that results in
0 206180115.0
1 419120319.0
2 125050304.0
3 225050137.0
4 124000601.0
...
995 250300107.0
996 217230301.0
997 203030703.0
998 135070323.0
999 204160717.0
Name: tcad_id, Length: 1000, dtype: float64
I need them to be integers so I can compare against another list of integers.
HALP!
Upvotes: 2
Views: 1818
Reputation: 1686
Probably too late, but are there "nan" or infinites in your data? This was the issue in my case. You can try doing:
pd.to_numeric(citydata.tcad_id.replace([np.inf, -np.inf], np.nan).dropna(),
downcast='integer', errors='coerce')
Upvotes: 1
Reputation: 4322
If you have a look at the docs here you'll see the following:
The default return dtype is float64 or int64 depending on the data supplied. Use the downcast parameter to obtain other dtypes.
So it seems like pandas has decided to cast your data into float64
. Use downcast:'integer'
to get integer values.
Upvotes: 2