Reputation: 444
I am trying to read in 50 csv files from a zip file but keep getting
CParserError: Error tokenizing data. C error: EOF inside string starting at line 166 I know there is an error with reading a particular string within the data and can fix in manually but dont want to have to extract all csv files manually to fix each one.
with zipfile.ZipFile('C:\Users\Austen\Anaconda\cs109_final\CA34.zip') as zf:
for name in zf.namelist():
container[name] = pd.read_csv(zf.open(name))
The problem I found is that there is a single ; in each csv file towards the end of the file. How would I ignore that?
With reference from:
https://github.com/pydata/pandas/issues/5500
Tried to add
container[name] = pd.read_csv(zf.open(name),skipfooter=4)
But I get 'unexpected end of data'
Upvotes: 1
Views: 10373
Reputation: 800
Passing engine="python" solves the issue.
Reference:Most frequent errors
Upvotes: 2
Reputation: 8064
Would adding an option to read_csv fix the problem? I had a similar problem and it was fixed by adding the option quoting=csv.QUOTE_NONE
For example:
df = pd.read_csv(csvfile, header = None, delimiter="\t", quoting=csv.QUOTE_NONE, encoding='utf-8')
The second comment in this discussion talks about why: https://github.com/pydata/pandas/issues/5500
Upvotes: 6