Claire_L
Claire_L

Reputation: 13

Python pandas read_csv() ParserError: unexpected end of data

My csv file looks like this:

"City","Name","Comment"
"A","Jay","Like it"
"B","Rosy","Well, good"
...

"K","Anna","Works "fine""

The expected output(dataframe):

City,Name,Comment

A,Jay,'Like it'

B,Rosy,'Well, good'

...

K,Anna,'Works "fine"'

I am trying to read it by doing this :

df=pd.read_csv("test.csv", sep=',', engine='python',encoding='utf8', quoting=csv.QUOTE_ALL)

And it is giving error like this :

ParserError: unexpected end of data

I need to have all the data. So I can not skip any lines with error_bad_lines=True.

How I can fix this issue?

UPDATE: It turns out my original CSV file is missing a quote at the end of the file. I solved the problem by identifying the errors in the file and modifying them.

Upvotes: 0

Views: 3097

Answers (1)

Ossi H.
Ossi H.

Reputation: 84

I believe the trick is to preprocess and then read the data


import re
from io import StringIO
import pandas as pd

data = """
"City","Name","Comment"
"A","Jay","Like it"
"B","Rosy","Well, good"
"K","Anna","Works "fine""
"""

data = re.sub('(?<!^)"(?!,")(?<!,")(?!$)', '\\"', data, flags=re.M)

x = pd.read_csv(StringIO(data), escapechar='\\')

print(x)

Outputs

   City  Name       Comment
0     A   Jay       Like it
1     B  Rosy    Well, good
2     K  Anna  Works "fine"

And in theory this should work the same with the file

with open('test.csv', 'r') as f:
    data = re.sub('(?<!^)"(?!,")(?<!,")(?!$)', '\\"', f.read(), flags=re.M)
    df = pd.read_csv(StringIO(data), escapechar='\\')
    print(df)

Edit : It outputs as following

  City  Name       Comment
0    A   Jay       Like it
1    B  Rosy    Well, good
2    K  Anna  Works "fine"

From kayoz answer

Edit 2: and the last column is easy to change with a lambda or similar function.

df['Comment'] = df['Comment'].apply(lambda x: "'" + str(x) + "'")

Link to the original

TL;DR

import re
from io import StringIO
import pandas as pd

with open('test.csv', 'r') as f:
    data = re.sub('(?<!^)"(?!,")(?<!,")(?!$)', '\\"', f.read(), flags=re.M)
    df = pd.read_csv(StringIO(data), escapechar='\\')
    df['Comment'] = df['Comment'].apply(lambda x: "'" + str(x) + "'")
    print(df)
  City  Name  Comment
0    A   Jay  'Like it'
1    B  Rosy  'Well, good'
2    K  Anna  'Works "fine"'

Upvotes: 1

Related Questions