Reputation: 2092
The problem:
I have been having an issue trying to find the average of a column from a csv file using python's dictreader.
I have tried:
Accessing the columns like this using the column name, this works but the column name is required and im unsure how to loop over the reader.fieldnames in a way to construct a list from just each single column rather than mixing all columns data into the same list :
for r in reader:
print(r.get("Price"))
Example of the loop
for i in reader.fieldnames:
for r in reader:
print(row.get(i))
This is fine, however prints out 1 element from each column for each row. This makes it difficult to assemble a list of say all prices, all names etc as it would just rebuild the dictreader in list form.
Question
How can i read just a single entire column from dictreader so i can access each column individually as a list and perform operations on it?
Note: so far i have tried appending each element using the loop, but results in a N size array with 4 elements in each row.
Upvotes: 1
Views: 9764
Reputation: 23773
data.csv:
'''
one, two, three
1,2,3
4,5,6
7,8,9
10,11,12
'''
Use a plain reader object, get the headers, transpose the data, combine the headers with the data to create a dict.
import csv
with open('data.csv') as f:
reader = csv.reader(f)
headers = next(reader)
# transpose the data
# --> columns become rows and rows become columns
data = zip(*reader)
# create a dictionary by combining the headers with the data
d = dict(zip(headers, data))
>>> from pprint import pprint
>>> pprint(d)
{' three': ('3', '6', '9', '12'),
' two': ('2', '5', '8', '11'),
'one': ('1', '4', '7', '10')}
>>>
Upvotes: 1
Reputation: 2124
You could use the pandas module. It is very powerful and can deal with csv files.
import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df['column_name']
Upvotes: 3
Reputation: 2507
If you're fine looping over your file once for each column you want to read, just build a dict comprehension of list comprehensions:
columns = {fieldname: [row.get(fieldname) for row in reader] for fieldname in reader.fieldnames}
There's not really a better way to do it, just based on the nature of the file... csv's are a series of rows, turning them into columns is gonna be a little wasteful. You can tinker with this if you only want certain fieldnames extracted.
If you really need to only read the file once, though:
columns = {}
for row in reader:
for fieldname in reader.fieldnames:
columns.setdefault(fieldname, []).append(row.get(fieldname))
Upvotes: 2