Reputation: 339
I have this csv file. I want to convert this to dictionary. This csv file contains 17584980
lines
ozone,particullate_matter,carbon_monoxide,sulfure_dioxide,nitrogen_dioxide,longitude,latitude,timestamp,avgMeasuredTime,avgSpeed,extID,medianMeasuredTime,TIMESTAMP:1,vehicleCount,_id,REPORT_ID,Lat1,Long1,Lat2,Long2,Distance between 2 points,duration of measurements,ndt in kmh
127,38,62,22,39,10.1050,56.2317,1406859600,74,50,668,74,1406859600,5,20746220,158324,56.2317,10.1050,56.2258,10.1166,1030,52,71
122,35,61,17,34,10.1050,56.2317,1406859900,73,50,668,73,1406859900,6,20746392,158324,56.2317,10.1050,56.2258,10.1166,1030,52,71
117,36,65,24,34,10.1050,56.2317,1406860200,61,60,668,61,1406860200,4,20746723,158324,56.2317,10.1050,56.2258,10.1166,1030,52,71
What i have tried
#code to generate dictionaries from csv file
import csv
reader = csv.DictReader(open('resultsout.csv'))
output = open("finaldata.py","w")
result = {}
for row in reader:
for column, value in row.iteritems():
result.setdefault(column, []).append(float(value))
output.write(str(result))
Error:
Traceback (most recent call last):
File "dictionaries.py", line 11, in <module>
result.setdefault(column, []).append(float(value))
ialueError: invalid literal for float(): 32
But this code worked before
Upvotes: 0
Views: 170
Reputation: 25829
While that is unsafe way to do what you want (not to mention that there is a little reason to convert a huge CSV into a huge Python file), provided that you fix the indentation your code should work - the problem stems from some of your data that you didn't show here - some value within it is bad (like 32\x00
or 32\x07
) which fails converting to float.
Here's how to handle it:
import csv
DEFAULT = 0.0 # value to use when conversion fails
with open("resultsout.csv", "r") as i:
reader = csv.DictReader(i)
result = {k: [] for k in reader.fieldnames}
for row in reader:
for column, value in row.iteritems():
try:
result[column].append(float(value))
except ValueError:
result[column].append(DEFAULT)
with open("finaldata.py", "w") as o:
o.write(str(result))
Or, optionally, you can strip out non-numeric characters before converting ensuring that the conversion doesn't fail because of some extra non-printable characters:
import csv
import re
STRIP_CHARS = re.compile(r"[^\d.]+")
with open("resultsout.csv", "r") as i:
reader = csv.DictReader(i)
result = {k: [] for k in reader.fieldnames}
for row in reader:
for column, value in row.iteritems():
result[column].append(float(STRIP_CHARS.sub("", value)))
with open("finaldata.py", "w") as o:
o.write(str(result))
Or you can combine both for maximum reliability.
Upvotes: 1