Reputation: 332
I have just started learning csv module recently. Suppose we have this CSV file:
John,Jeff,Judy,
21,19,32,
178,182,169,
85,74,57,
And we want to read this file and create a dictionary containing names (as keys) and totals of each column (as values). So in this case we would end up with:
d = {"John" : 284, "Jeff" : 275, "Judy" : 258}
So I wrote this code which apparently works well, but I am not satisfied with it and was wondering if anyone knows of better or more efficient/elegant way of doing this. Because there's just too many lines in there :D (Or maybe a way we could generalize it a bit - i.e. we would not know how many fields are there.)
d = {}
import csv
with open("file.csv") as f:
readObject = csv.reader(f)
totals0 = 0
totals1 = 0
totals2 = 0
totals3 = 0
currentRowTotal = 0
for row in readObject:
currentRowTotal += 1
if currentRowTotal == 1:
continue
totals0 += int(row[0])
totals1 += int(row[1])
totals2 += int(row[2])
if row[3] == "":
totals3 += 0
f.close()
with open(filename) as f:
readObject = csv.reader(f)
currentRow = 0
for row in readObject:
while currentRow <= 0:
d.update({row[0] : totals0})
d.update({row[1] : totals1})
d.update({row[2] : totals2})
d.update({row[3] : totals3})
currentRow += 1
return(d)
f.close()
Thanks very much for any answer :)
Upvotes: 1
Views: 408
Reputation: 1884
Base on Michasel's solution, I would try with less code and less variables and no dependency on Numpy
:
import csv
with open("so.csv") as f:
reader = csv.reader(f)
titles = next(reader)
sum_result = reduce(lambda x,y: [ int(a)+int(b) for a,b in zip(x,y)], list(reader))
print dict(zip(titles, sum_result))
Upvotes: 0
Reputation: 238269
Not sure if you can use pandas, but you can get your dict as follows:
import pandas as pd
df = pd.read_csv('data.csv')
print(dict(df.sum()))
Gives:
{'Jeff': 275, 'Judy': 258, 'John': 284}
Upvotes: 3
Reputation: 12239
Use the top row to figure out what the column headings are. Initialize a dictionary of totals based on the headings.
import csv
with open("file.csv") as f:
reader = csv.reader(f)
titles = next(reader)
while titles[-1] == '':
titles.pop()
num_titles = len(titles)
totals = { title: 0 for title in titles }
for row in reader:
for i in range(num_titles):
totals[titles[i]] += int(row[i])
print(totals)
Let me add that you don't have to close the file after the with
block. The whole point of with
is that it takes care of closing the file.
Also, let me mention that the data you posted appears to have four columns:
John,Jeff,Judy,
21,19,32,
178,182,169,
85,74,57,
That's why I did this:
while titles[-1] == '':
titles.pop()
Upvotes: 0
Reputation: 3350
It's a little dirty, but try this (operating without the empty last column):
#!/usr/bin/python
import csv
import numpy
with open("file.csv") as f:
reader = csv.reader(f)
headers = next(reader)
sums = reduce(numpy.add, [map(int,x) for x in reader], [0]*len(headers))
for name, total in zip(headers,sums):
print("{}'s total is {}".format(name,total))
Upvotes: 0