Benjamin
Benjamin

Reputation: 3477

Pandas detect the problematic column on a cast error

I have some csv files and sometime I badly configure the dtype parameter in the pandas.read_csv method so Pandas failed with:

TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'

without saying on which column this conversion failed.

How can I retrieve the column's name or index (and maybe first wrong value) of the failure?

PS: I cannot use auto detect / type inference.

Upvotes: 0

Views: 834

Answers (1)

Gijs Wobben
Gijs Wobben

Reputation: 2060

The only way to go is to let pandas read your CSV without imposing a dtype, and then looping over the columns trying to set the correct dtype.

import pandas
import random

# Sample dataset, read yours with
# df = pandas.read_csv("myfile.csv")
df = pandas.DataFrame([{"A": random.randint(0, 100), "B": "test " + str(random.random())} for _ in range(1000)])

# Loop the columns
for column in df.columns:
    try:
        # Cast to the correct type
        df[column] = df[column].astype(int)
    except:
        print("Error trying to set type of column: ", column)
        # Optional: raise the exception here to stop execution

Upvotes: 1

Related Questions