Reputation: 751
I have a csv dataset like this:
A, 10, USA
B,30, UK
C,4,IT
A,20,UK
B,10,USA
I want to read this csv lines and provide the following output:
A has ran 30 miles with average of 15.
B has ran 30 miles with average of 20.
C has ran 4 miles with average of 4.
My solution so far is to read the csv data and convert them to dictionary and then Iterate over them to see how many times 'A' has been repeated and what values it has had to calculate for the average and finally producing the result. I have already written this code, but I have a hard time to efficiently calculate for the number of times that A has been repeated and add up the miles to create my final output. any thoughts to do this in Python? It is sort of easy for me to do this in C#, but I am not that good with Python.
def main(filename):
f = open(filename,'r')
labels = ['name','miles','country']
data = csv.DictReader(f,labels,delimiter=',')
for line in data:
print (line['name']+" " + "has ran" +" "+ line['miles']+" "+"miles")
f.close()
Upvotes: 3
Views: 8503
Reputation: 178
You could do the following:
Load it with the load_csv function here: https://github.com/kdavies4/ModelicaRes/blob/master/modelicares/util.py#L676 by downloading the util.py file there and doing the following:
from util import load_csv
d = load_csv("log.csv")
You should end up with the following dictionary:
d = {'Runner': ['A', 'B', 'C', 'A', 'B'], 'Distance': [10, 30, 4, 20, 10], 'Country': ['USA', 'UK', 'IT', 'UK', 'USA']}
The load_csv function uses csv.reader to automatically create a dictionary with keys named after the entries it finds in the header row. The dictionary entry for each key is a list of the values from the corresponding column. The documentation is here:
http://kdavies4.github.io/ModelicaRes/util.html#modelicares.util.load_csv. The util.py file contains a lot of other functions, but you can safely remove them (as long as you leave import numpy as np
at the top).
Run the following code to calculate the average distances:
# Create counter dictionaries for the statistics.
from collections import Counter
n_runs = Counter()
totals = Counter()
# Calculate the total distance.
for runner, distance in zip(d['Runner'], d['Distance']):
n_runs[runner] += 1
totals[runner] += distance
# Print the results.
for runner in set(d['Runner']):
print("%s has run %i miles with an average of %g."
% (runner, totals[runner], totals[runner]/float(n_runs[runner])))
This should give you the results you wanted, and it should be flexible if the data is expanded (by rows or columns).
Kevin
Upvotes: 1
Reputation: 11
I your loop use split as follows:
var1, var2, var3 = line.split(",")
This will get you each of the values in the line in each of the variables. Then use the variables as you like.
Upvotes: 1
Reputation: 298046
You can use a defaultdict
to store the values and then print them out:
import csv
from collections import defaultdict
with open(filename, 'r') as handle:
reader = csv.DictReader(handle, ['name', 'miles', 'country'])
data = defaultdict(list)
for line in reader:
data[line['name']).append(int(line['miles']))
for runner, distances in data.items():
print '{} ran a total of {} miles and an average of {} miles'.format(
runner, sum(distances), sum(distances) / float(len(distances))
)
Upvotes: 6