ParserError: Error tokenizing data C error

Question

I am trying to run this code which removes unnecessary columns from a dataframe for later processing. It loops through the first files then gives the error below. Before it was running fine. I saw something about it maybe being a corrupted file, so I deleted all previous files and have gone through producing all the files in the steps again, but I am still getting the error. Sorry if it is long winded, I need to show each step for my thesis and also I am still very much a novice programmer. Can anyone help with fixing this issue?

The code is:

import pandas as pd
import os

path = ('./Sketch_grammar/weighted/')
files = os.listdir(path)
for file in files:
    df = pd.read_csv(path+file)
    df = df.drop('Hits', axis=1)
    df = df.drop('Score', axis=1)
    df = df.drop('Score.1', axis=1)
    print(df)
    filename = os.path.splitext(file)
    (f, ext) = filename
    print(f)
    df.to_csv(path+'weighted_out/'+f+'_out.csv', index=False)

The error message is as follows:

Traceback (most recent call last):
  File "/home/sandra/git/trees/trees/remove_columns.py", line 9, in 
    df = pd.read_csv(path+file)
  File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 440, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 787, in __init__
    self._make_engine(self.engine)
  File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1014, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1708, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 539, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 737, in pandas._libs.parsers.TextReader._get_header
  File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'

.

Vishnudev Krishnadas · Accepted Answer

This error is usually raised when the file read using pandas is either corrupted or not in a readable state. Modifying code as below should work:

import pandas as pd
import os

path = ('./Sketch_grammar/weighted/')
files = os.listdir(path)
for file in files:
    if file.endswith('.csv'):
        df = pd.read_csv(path+file)
        df = df.drop('Hits', axis=1)
        df = df.drop('Score', axis=1)
        df = df.drop('Score.1', axis=1)
        filename = os.path.splitext(file)
        (f, ext) = filename
        df.to_csv(path+'weighted_out/'+f+'_out.csv', index=False)

ParserError: Error tokenizing data C error

Answers (1)

Related Questions