Reputation: 11035
I have a csv that I'm trying to read with Python, which has lines that look like the following:
10x org il,"["Modiin, Israel"]","["no current price posted"]","["Modiin: no current size posted"]","{ "Python Bootcamp": {"Price: ","["http://www.10x.org.il/"]","[{ "j": 31.9077, "C": 35.0076 }]"
but it breaks on the first square bracket of this: "[{ "j": 31.9077, "C": 35.0076 }]"
with the error message SyntaxError: invalid syntax
I am using the following python to read the file:
import csv
with open('programming_bootcamps_csv.csv', 'rb') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row["campName"])
I have been reading through some documents on why a square bracket would break this, and haven't come to any conclusions on what the issue is or how to fix it.
Upvotes: 0
Views: 1806
Reputation: 6053
I have reconstructed your data according to the comments as follows:
col1,col2,col3,col4,col5,col6,col7
10x org il,"[""Modiin, Israel""]","[""no current price posted""]","[""Modiin: no current size posted""]","{ ""Python Bootcamp"": {""Price: ","[""http://www.10x.org.il/""]","[{ ""j"": 31.9077, ""C"": 35.0076 }]"
Now everything works as expected.
import csv
with open('data.csv', 'rb') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
for key, value in row.items():
print "{}: {}".format(key, value)
output:
col6: ["http://www.10x.org.il/"]
col7: [{ "j": 31.9077, "C": 35.0076 }]
col4: ["Modiin: no current size posted"]
col5: { "Python Bootcamp": {"Price:
col2: ["Modiin, Israel"]
col3: ["no current price posted"]
col1: 10x org il
It looks like the curious format is a broken mixture of json and csv. Your curly brackets don't match up and there is no consistency. Since it looks like it was generated automatically I would strongly suggest fixing the data format upstream in the program that generated the file.
However, if you can't fix the data upstream then further processing should be simple, possibly using json.loads()
or if necessary with brute force.
Upvotes: 1
Reputation: 4486
Instead of using DictReader, write your own parser.
def dict_reader(fn, header=None):
for line in open(fn):
row = []
while line:
field,_, line = line.partition(",")
while field.startswith('"["') and not field.endswith('"]"'):
rest, _, line = line.partition(",")
field += rest
row.append(field)
if header is None:
header = row
continue
yield dict(zip(header, row))
Upvotes: 0