Reputation: 3477
I have some csv files and sometime I badly configure the dtype
parameter in the pandas.read_csv
method so Pandas failed with:
TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'
without saying on which column this conversion failed.
How can I retrieve the column's name or index (and maybe first wrong value) of the failure?
PS: I cannot use auto detect / type inference.
Upvotes: 0
Views: 834
Reputation: 2060
The only way to go is to let pandas
read your CSV without imposing a dtype, and then looping over the columns trying to set the correct dtype.
import pandas
import random
# Sample dataset, read yours with
# df = pandas.read_csv("myfile.csv")
df = pandas.DataFrame([{"A": random.randint(0, 100), "B": "test " + str(random.random())} for _ in range(1000)])
# Loop the columns
for column in df.columns:
try:
# Cast to the correct type
df[column] = df[column].astype(int)
except:
print("Error trying to set type of column: ", column)
# Optional: raise the exception here to stop execution
Upvotes: 1