maudulus
maudulus

Reputation: 11035

Reading csv with Python breaks on square bracket

I have a csv that I'm trying to read with Python, which has lines that look like the following:

10x org il,"["Modiin, Israel"]","["no current price posted"]","["Modiin: no current size posted"]","{ "Python Bootcamp": {"Price: ","["http://www.10x.org.il/"]","[{ "j": 31.9077, "C": 35.0076 }]"

but it breaks on the first square bracket of this: "[{ "j": 31.9077, "C": 35.0076 }]" with the error message SyntaxError: invalid syntax

I am using the following python to read the file:

import csv

with open('programming_bootcamps_csv.csv', 'rb') as csvfile:
     reader = csv.DictReader(csvfile)
     for row in reader:
         print(row["campName"])

I have been reading through some documents on why a square bracket would break this, and haven't come to any conclusions on what the issue is or how to fix it.

Upvotes: 0

Views: 1806

Answers (2)

Graeme Stuart
Graeme Stuart

Reputation: 6053

Don't manually pre-process data unless absolutely necessary

I have reconstructed your data according to the comments as follows:

col1,col2,col3,col4,col5,col6,col7
10x org il,"[""Modiin, Israel""]","[""no current price posted""]","[""Modiin: no current size posted""]","{ ""Python Bootcamp"": {""Price: ","[""http://www.10x.org.il/""]","[{ ""j"": 31.9077, ""C"": 35.0076 }]"

Now everything works as expected.

import csv

with open('data.csv', 'rb') as csvfile:
     reader = csv.DictReader(csvfile)
     for row in reader:
         for key, value in row.items():
            print "{}: {}".format(key, value)

output:

col6: ["http://www.10x.org.il/"]
col7: [{ "j": 31.9077, "C": 35.0076 }]
col4: ["Modiin: no current size posted"]
col5: { "Python Bootcamp": {"Price:
col2: ["Modiin, Israel"]
col3: ["no current price posted"]
col1: 10x org il

It looks like the curious format is a broken mixture of json and csv. Your curly brackets don't match up and there is no consistency. Since it looks like it was generated automatically I would strongly suggest fixing the data format upstream in the program that generated the file.

However, if you can't fix the data upstream then further processing should be simple, possibly using json.loads() or if necessary with brute force.

Upvotes: 1

krethika
krethika

Reputation: 4486

Instead of using DictReader, write your own parser.

def dict_reader(fn, header=None):
    for line in open(fn):
        row = []
        while line:
            field,_, line = line.partition(",")
            while field.startswith('"["') and not field.endswith('"]"'):
                rest, _, line = line.partition(",")
                field += rest
            row.append(field)
        if header is None:
            header = row
            continue
        yield dict(zip(header, row))

Upvotes: 0

Related Questions