Pandas skipping malformed line in csv

Question

I am trying to read a csv file with pandas. the file is very long and malformed in the middle like so

Date,Received Date,Tkr,Theta,Wid,Per
2007-08-03,2017/02/13 05:30:G,F,B,A,1
2007-08-06,2017/02/13 05:30:G,F,A,B,1
2007-08-07,2017/02/13 05:30:G,F,A,B,1
2007-08-,nan,,,,
2000-05-30 00:00:00,2017/02/14 05:30:F,D,10,1,1
2000-05-31 00:00:00,2017/02/14 05:30:F,D,10,1,1

My line which is failing is this:

full_frame = pd.read_csv(path, parse_dates=["Date"],error_bad_lines=False).set_index("Date").sort_index()[:date]

with the error

TypeError: unorderable types: str() > datetime.datetime()
   File "/A/B/C.py", line 236, in load_ex
    full_frame = pd.read_csv(path, parse_dates=["Date"],error_bad_lines=False).set_index("Date").sort_index()[:date]

date is just a variable that holds a given input date.

This is happening because of the broken line in the middle. I have tried to do

error_bad_line=False but that wont prevent my script from failing.

When i take out the bad line from my csv and run it, it works fine. This csv will be used as an input and I cant modify it at source so I was wondering if there is a way to skip a line based on length of the line in the csv in pandas or something else I can do to make it work without duplicating/modifyng the file

UPDATE

The bad line is stored in my data frame if i simply do a

read_csv

as 2007-08- NaN NaN NaN NaN NaN

UPDATE 2:

if i try to just do

full_frame = pd.read_csv(path, parse_dates=["Date"],error_bad_lines=False)
full_frame = full_frame.dropna(how="any")
# this drops the NaN row for sure
full_frame = full_frame.set_index("Date").sort_index()[:date]

still gives same error :(

Pandas skipping malformed line in csv

Answers (1)

Related Questions