Reputation: 449
I want to change the dtype of some columns in my DataFrame via iloc. But when I try this the dtype does not change (it's still object):
import pandas as pd
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
df = pd.read_csv('iris.csv', names=names, header=None)
df = df[1:]
In [11]: df.head()
Out[11]:
sepal-length sepal-width petal-length petal-width class
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
In [12]: df.iloc[:,:-1] = df.iloc[:,:-1].astype(float)
# No Error
In [13]: df.dtypes # still object dtype
Out[13]:
sepal-length object
sepal-width object
petal-length object
petal-width object
class object
dtype: object
Note: I can do this without iloc, but it's too long:
df[['sepal-length', 'sepal-width', 'petal-length', 'petal-width']] = df[['sepal-length', 'sepal-width', 'petal-length', 'petal-width']].astype(float)
Upvotes: 3
Views: 3634
Reputation: 21
df.infer_objects()
is the right way to prepare a df for machine learning algorithms (like XGBoost). Most csv imported dataframes have 'object' dtypes and they are not supported by many machine learning algorithms like catboost, xgboost, etc... To get them working use df.infer_objects()
.
Upvotes: 0
Reputation: 375675
You can use infer_objects
:
In [11]: df.infer_objects()
Out[11]:
sepal-length sepal-width petal-length petal-width class
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
In [12]: df.infer_objects().dtypes
Out[12]:
sepal-length float64
sepal-width float64
petal-length float64
petal-width float64
class object
dtype: object
The issue is that whilst the right-hand-side is correct:
In [21]: df.iloc[:,:-1].astype(float).dtypes
Out[21]:
sepal-length float64
sepal-width float64
petal-length float64
petal-width float64
dtype: object
The assignment df.iloc[:,:-1] =
is updating existing columns and not changing their dtype.
Upvotes: 7
Reputation: 51395
The problem is with using iloc
. You can get around this using regular column indexing:
df[df.columns[:-1]] = df[df.columns[:-1]].astype(float)
Alternatively:
You can apply to_numeric
to all columns like this, and it will skip class
because it can't be converted:
df = df.apply(pd.to_numeric, errors='ignore', axis=1)
Upvotes: 5