Any way to flag bad lines in pandas when reading an excel file?

Question

pandas.read_csv has (warn, error) bad lines methods. I can't see any for pandas.read_excel. Is there a reason? For example, if I wanted to read an excel file where a column is supposed to be a datetime and the pandas.read_execl function encounters an int or str in one/few of the rows. Do i need to handle this myself?

David Erickson · Accepted Answer

In short, no I do not believe there is a way to do automatically do this with a parameter you pass to read_excel(). This is how to solve your problem though:

Let's say that when you read in your dataframe it looks like this:

df = pd.read_excel('Desktop/Book1.xlsx')
df
                   Date
0   2020-09-13 00:00:00
1   2
2   abc
3   2020-09-14 00:00:00

You can you pass errors='coerce' to pd.to_datetime():

df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df

          Date
0   2020-09-13
1   NaT
2   NaT
3   2020-09-14

Finally, you can drop those rows with:

df = df[df['Date'].notnull()]
df

          Date
0   2020-09-13
3   2020-09-14

Any way to flag bad lines in pandas when reading an excel file?

Answers (1)

Related Questions