Reputation: 783
I have a CSV file with headers at the top of columns of data as:
a,b,c
1,2,3
4,5,6
7,8,9
and I need to read it in a dict of lists:
desired_result = {'a': [1, 4, 7], 'b': [2, 5, 8], 'c': [3, 6, 9]}
When reading this with DictReader
I am using a nested loop to append the items to the lists:
f = 'path_to_some_csv_file.csv'
dr = csv.DictReader(open(f))
dict_of_lists = dr.next()
for k in dict_of_lists.keys():
dict_of_lists[k] = [dict_of_lists[k]]
for line in dr:
for k in dict_of_lists.keys():
dict_of_lists[k].append(line[k])
The first loop sets all values in the dict to the empty list. The next one loops over every line read in from the CSV file, from which DictReader
creates a dict of key-values. The inner loop appends the value to list matching the corresponding key, so I wind up with the desired list of dicts. I end up having to write this fairly often.
My question is, is there a more Pythonic way of doing this using built-in functions without the nested loop, or a better idiom, or an alternative way to store this data structure such that I can return an indexable list by querying with a key? If so is there also a way to format the data being ingested by column upfront?
Upvotes: 10
Views: 10349
Reputation: 251
Nothing wrong with ford's answer, I'll just add mine here (which makes use of the csv library)
with open(f,'r',encoding='latin1') as csvf:
dialect = csv.Sniffer().sniff(csvf.readline()) # finds the delimiters automatically
csvf.seek(0)
# read file with dialect
rdlistcsv = csv.reader(csvf,dialect)
# save to list of rows
rowslist = [list(filter(None,line)) for line in rdlistcsv]
header = rowslist[0]
data = {}
for i,key in enumerate(header):
ilist = [row[i] for row in rowslist]
data.update({key: ilist})
EDIT: actually, if you do not mind using pandas things get way easier with it:
import pandas
import pandas as pd
import file and save it as pandas dataframe
df = pd.read_csv(inputfile)
turn df into a dictionary
mydict = df.to_dict(orient='list')
This way you use the csv header to define the keys and for each key you have a list of elements (something like an excel column turned to a list)
Upvotes: 1
Reputation: 11826
Depending on what type of data you're storing and if you're ok with using numpy, a good way to do this can be with numpy.genfromtxt
:
import numpy as np
data = np.genfromtxt('data.csv', delimiter=',', names=True)
What this will do is create a numpy Structured Array, which provides a nice interface for querying the data by header name (make sure to use names=True
if you have a header row).
Example, given data.csv
containing:
a,b,c
1,2,3
4,5,6
7,8,9
You can then access elements with:
>>> data['a'] # Column with header 'a'
array([ 1., 4., 7.])
>>> data[0] # First row
(1.0, 2.0, 3.0)
>>> data['c'][2] # Specific element
9.0
>>> data[['a', 'c']] # Two columns
array([(1.0, 3.0), (4.0, 6.0), (7.0, 9.0)],
dtype=[('a', '<f8'), ('c', '<f8')])
genfromtext
also provides a way, as you requested, to "format the data being ingested by column up front."
converters : variable, optional
The set of functions that convert the data of a column to a value. The converters can also be used to provide a default value for missing data:
converters = {3: lambda s: float(s or 0)}
.
Upvotes: 13
Reputation: 103864
You can use dict and set comprehensions to make your intent more obvious:
dr=csv.DictReader(f)
data={k:[v] for k, v in dr.next().items()} # create the initial dict of lists
for line_dict in dr:
{data[k].append(v) for k, v in line_dict.items()} # append to each
You can use Alex Martelli's method to flatten a list of lists in Python to flatten an iterator of iterators, which further reduces the first form to:
dr=csv.DictReader(f)
data={k:[v] for k, v in dr.next().items()}
{data[k].append(v) for line_dict in dr for k, v in line_dict.items()}
On Python 2.X, consider using {}.iteritems vs {}.items() if your csv file is sizable.
Further example:
Assume this csv file:
Header 1,Header 2,Header 3
1,2,3
4,5,6
7,8,9
Now suppose you want a dict of lists of each value converted to a float or int. You can do:
def convert(s, converter):
try:
return converter(s)
except Exception:
return s
dr=csv.DictReader(f)
data={k:[convert(v, float)] for k, v in dr.next().items()}
{data[k].append(convert(v, float)) for line_dict in dr for k, v in line_dict.items()}
print data
# {'Header 3': [3.0, 6.0, 9.0], 'Header 2': [2.0, 5.0, 8.0], 'Header 1': [1.0, 4.0, 7.0]}
Upvotes: -2
Reputation: 363607
If you're willing to use a third-party library, then the merge_with
function from Toolz makes this whole operation a one-liner:
dict_of_lists = merge_with(list, *csv.DictReader(open(f)))
Using only the stdlib, a defaultdict
makes the code less repetitive:
from collections import defaultdict
import csv
f = 'test.csv'
dict_of_lists = defaultdict(list)
for record in DictReader(open(f)):
for key, val in record.items(): # or iteritems in Python 2
dict_of_lists[key].append(val)
If you need to do this often, factor it out into a function, e.g. transpose_csv
.
Upvotes: 2