Datatypes not changing in Pandas DataFrame

Question

I am trying to convert specific columns in my DataFrame to dtype: float. I tried this:

grid[['DISTINCT_COUNT','MAX_COL_LENGTH', 'MIN_COL_LENGTH', 'NULL_COUNT' ]].apply(pd.to_numeric, errors='ignore')

But when I print this afterwards:

print(grid.dtypes)

I am still seeing this:

COLUMN_NM         object
DISTINCT_COUNT    object
NULL_COUNT        object
MAX_COL_VALUE     object
MIN_COL_VALUE     object
MAX_COL_LENGTH    object
MIN_COL_LENGTH    object
TABLE_CNT         object
TABLE_NM          object
DATA_SOURCE       object
dtype: object

Any ideas?

pault · Accepted Answer

Using apply() does not modify the DataFrame in place. You have to assign the output of the operation back to the original DataFrame.

@coldspeed's answer here explains what's going on here:

All these slicing/indexing operations create views/copies of the original dataframe and you then reassign df to these views/copies, meaning the originals are not touched at all.

In your case, you need to do:

columns = ['DISTINCT_COUNT','MAX_COL_LENGTH', 'MIN_COL_LENGTH', 'NULL_COUNT']
grid[columns] = grid[columns].apply(pd.to_numeric, errors='ignore')

Or you could also do:

grid[columns] = pd.to_numeric(grid[columns], errors='ignore')

Datatypes not changing in Pandas DataFrame

Answers (1)

Related Questions