Tomas T
Tomas T

Reputation: 449

Change dtype of specific columns with iloc

I want to change the dtype of some columns in my DataFrame via iloc. But when I try this the dtype does not change (it's still object):

import pandas as pd
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
df = pd.read_csv('iris.csv', names=names, header=None)
df = df[1:]

In [11]: df.head()
Out[11]:
   sepal-length  sepal-width  petal-length  petal-width   class
1           5.1          3.5           1.4          0.2  setosa
2           4.9          3.0           1.4          0.2  setosa
3           4.7          3.2           1.3          0.2  setosa
4           4.6          3.1           1.5          0.2  setosa
5           5.0          3.6           1.4          0.2  setosa


In [12]: df.iloc[:,:-1] = df.iloc[:,:-1].astype(float)
# No Error

In [13]: df.dtypes  # still object dtype
Out[13]:
sepal-length    object
sepal-width     object
petal-length    object
petal-width     object
class           object
dtype: object

Note: I can do this without iloc, but it's too long:

df[['sepal-length', 'sepal-width', 'petal-length', 'petal-width']] = df[['sepal-length', 'sepal-width', 'petal-length', 'petal-width']].astype(float)

Upvotes: 3

Views: 3634

Answers (3)

PatRiot
PatRiot

Reputation: 21

df.infer_objects() is the right way to prepare a df for machine learning algorithms (like XGBoost). Most csv imported dataframes have 'object' dtypes and they are not supported by many machine learning algorithms like catboost, xgboost, etc... To get them working use df.infer_objects().

Upvotes: 0

Andy Hayden
Andy Hayden

Reputation: 375675

You can use infer_objects:

In [11]: df.infer_objects()
Out[11]:
   sepal-length  sepal-width  petal-length  petal-width   class
1           5.1          3.5           1.4          0.2  setosa
2           4.9          3.0           1.4          0.2  setosa
3           4.7          3.2           1.3          0.2  setosa
4           4.6          3.1           1.5          0.2  setosa
5           5.0          3.6           1.4          0.2  setosa

In [12]: df.infer_objects().dtypes
Out[12]:
sepal-length    float64
sepal-width     float64
petal-length    float64
petal-width     float64
class            object
dtype: object

The issue is that whilst the right-hand-side is correct:

In [21]: df.iloc[:,:-1].astype(float).dtypes
Out[21]:
sepal-length    float64
sepal-width     float64
petal-length    float64
petal-width     float64
dtype: object

The assignment df.iloc[:,:-1] = is updating existing columns and not changing their dtype.

Upvotes: 7

sacuL
sacuL

Reputation: 51395

The problem is with using iloc. You can get around this using regular column indexing:

df[df.columns[:-1]] = df[df.columns[:-1]].astype(float)

Alternatively:

You can apply to_numeric to all columns like this, and it will skip class because it can't be converted:

df = df.apply(pd.to_numeric, errors='ignore', axis=1)

Upvotes: 5

Related Questions