Austen Novis
Austen Novis

Reputation: 444

read_csv() & EOF character in string cause parsing issue

I am trying to read in 50 csv files from a zip file but keep getting

CParserError: Error tokenizing data. C error: EOF inside string starting at line 166 I know there is an error with reading a particular string within the data and can fix in manually but dont want to have to extract all csv files manually to fix each one.

with zipfile.ZipFile('C:\Users\Austen\Anaconda\cs109_final\CA34.zip') as zf:
   for name in zf.namelist():
      container[name] = pd.read_csv(zf.open(name))

The problem I found is that there is a single ; in each csv file towards the end of the file. How would I ignore that?

With reference from:

https://github.com/pydata/pandas/issues/5500

Tried to add

    container[name] = pd.read_csv(zf.open(name),skipfooter=4) 

But I get 'unexpected end of data'

Upvotes: 1

Views: 10373

Answers (2)

Kondalarao V
Kondalarao V

Reputation: 800

Passing engine="python" solves the issue.

Reference:Most frequent errors

Upvotes: 2

Selah
Selah

Reputation: 8064

Would adding an option to read_csv fix the problem? I had a similar problem and it was fixed by adding the option quoting=csv.QUOTE_NONE

For example:

df = pd.read_csv(csvfile, header = None, delimiter="\t", quoting=csv.QUOTE_NONE, encoding='utf-8')

The second comment in this discussion talks about why: https://github.com/pydata/pandas/issues/5500

Upvotes: 6

Related Questions