Ugur
Ugur

Reputation: 2044

Python csv reader // how to ignore enclosing char (because sometimes it's missing)

I am trying to import csv data from files where sometimes the enclosing char " is missing.

So I have rows like this:

"ThinkPad";"2000.00";"EUR"
"MacBookPro";"2200.00;EUR"  
# In the second row the closing " after 2200.00 is missing
# also the closing " before EUR" is missing

Now I am reading the csv data with this:

csv.reader(
            codecs.open(filename, 'r', encoding='latin-1'),
            delimiter=";",
            dialect=csv.excel_tab)

And the data I get for the second row is this:

["MacBookPro", "2200.00;EUR"]

Aside from pre-processing my csv files with a unix command like sed and removing all closing chars " and relying on the semicolon to seperate the columns, what else can I do?

Upvotes: 0

Views: 1746

Answers (2)

Wayne Werner
Wayne Werner

Reputation: 51877

This might work:

import csv
import io

file = io.StringIO(u'''
"ThinkPad";"2000.00";"EUR"
"MacBookPro";"2200.00;EUR"
'''.strip())

reader = csv.reader((line.replace('"', '') for line in file), delimiter=';', quotechar='"')
for row in reader:
    print(row)

The problem is that if there are any legitimate quoted line, e.g.

"MacBookPro;Awesome Edition";"2200.00";"EUR"

Or, worse:

"MacBookPro:
Description: Awesome Edition";"2200.00";"EUR"

Your output is going to produce too few/many columns. But if you know that's not a problem then it will work fine. You could pre-screen the file by adding this before the read part, which would give you the malformed line:

for line in file:
    if line.count(';') != 2:
        raise ValueError('No! This file has broken data on line {!r}'.format(line))
file.seek(0)

Or alternatively you could screen as you're reading:

for row in reader:
if any(';' in _ for _ in row):
    print('Error:')
print(row)

Ultimately your best option is to fix whatever is producing your garbage csv file.

Upvotes: 2

Danail Petrov
Danail Petrov

Reputation: 1875

If you're looping through all the lines/rows of the file, you can use string's .replace() function to get rid off the quotes (if you don't need them later-on for other purposes.).

>>> import csv
>>> with open('eggs.csv', 'rb') as csvfile:
...     my_file = csv.reader(codecs.open(filename, 'r', encoding='latin-1')
...       delimiter=";",
...       dialect=csv.excel_tab)
...     )
...     for row in my_file:
...         (model,price,currency) = row
...         model.replace('"','')
...         price.replace('"','')
...         currency.replace('"','')v
...         print 'Model is: %s (costs %s%s).' % (model,price,currency)
>>> 
Model is: MacBookPro (costs 2200.00EUR).

Upvotes: 0

Related Questions