Mikasa
Mikasa

Reputation: 339

CSV to dictionary conversion

I have this csv file. I want to convert this to dictionary. This csv file contains 17584980 lines

ozone,particullate_matter,carbon_monoxide,sulfure_dioxide,nitrogen_dioxide,longitude,latitude,timestamp,avgMeasuredTime,avgSpeed,extID,medianMeasuredTime,TIMESTAMP:1,vehicleCount,_id,REPORT_ID,Lat1,Long1,Lat2,Long2,Distance between 2 points,duration of measurements,ndt in kmh
127,38,62,22,39,10.1050,56.2317,1406859600,74,50,668,74,1406859600,5,20746220,158324,56.2317,10.1050,56.2258,10.1166,1030,52,71
122,35,61,17,34,10.1050,56.2317,1406859900,73,50,668,73,1406859900,6,20746392,158324,56.2317,10.1050,56.2258,10.1166,1030,52,71
117,36,65,24,34,10.1050,56.2317,1406860200,61,60,668,61,1406860200,4,20746723,158324,56.2317,10.1050,56.2258,10.1166,1030,52,71

What i have tried

#code to generate dictionaries from csv file
import csv

reader = csv.DictReader(open('resultsout.csv'))

output = open("finaldata.py","w")

result = {}
for row in reader:
    for column, value in row.iteritems():
    result.setdefault(column, []).append(float(value))

output.write(str(result))

Error:

Traceback (most recent call last):
  File "dictionaries.py", line 11, in <module>
    result.setdefault(column, []).append(float(value))
ialueError: invalid literal for float(): 32

But this code worked before

Upvotes: 0

Views: 170

Answers (1)

zwer
zwer

Reputation: 25829

While that is unsafe way to do what you want (not to mention that there is a little reason to convert a huge CSV into a huge Python file), provided that you fix the indentation your code should work - the problem stems from some of your data that you didn't show here - some value within it is bad (like 32\x00 or 32\x07) which fails converting to float.

Here's how to handle it:

import csv

DEFAULT = 0.0  # value to use when conversion fails

with open("resultsout.csv", "r") as i:
    reader = csv.DictReader(i)
    result = {k: [] for k in reader.fieldnames}
    for row in reader:
        for column, value in row.iteritems():
            try:
                result[column].append(float(value))
            except ValueError:
                result[column].append(DEFAULT)
    with open("finaldata.py", "w") as o:
        o.write(str(result))

Or, optionally, you can strip out non-numeric characters before converting ensuring that the conversion doesn't fail because of some extra non-printable characters:

import csv
import re

STRIP_CHARS = re.compile(r"[^\d.]+")

with open("resultsout.csv", "r") as i:
    reader = csv.DictReader(i)
    result = {k: [] for k in reader.fieldnames}
    for row in reader:
        for column, value in row.iteritems():
            result[column].append(float(STRIP_CHARS.sub("", value)))
    with open("finaldata.py", "w") as o:
        o.write(str(result))

Or you can combine both for maximum reliability.

Upvotes: 1

Related Questions