Payam
Payam

Reputation: 751

reading csv file in Python and creating dictionary

I have a csv dataset like this:

A, 10, USA
B,30, UK
C,4,IT
A,20,UK
B,10,USA

I want to read this csv lines and provide the following output:

A has ran 30 miles with average of 15. 
B has ran 30 miles with average of 20.
C has ran 4 miles with average of 4. 

My solution so far is to read the csv data and convert them to dictionary and then Iterate over them to see how many times 'A' has been repeated and what values it has had to calculate for the average and finally producing the result. I have already written this code, but I have a hard time to efficiently calculate for the number of times that A has been repeated and add up the miles to create my final output. any thoughts to do this in Python? It is sort of easy for me to do this in C#, but I am not that good with Python.

def main(filename):
    f = open(filename,'r')
    labels = ['name','miles','country']
    data = csv.DictReader(f,labels,delimiter=',')
    for line in data:
        print (line['name']+"  " + "has ran" +"   "+  line['miles']+" "+"miles")

    f.close()

Upvotes: 3

Views: 8503

Answers (3)

kdavies4
kdavies4

Reputation: 178

You could do the following:

  1. Add a header row to your data (e.g., "Runner, Distance, Country").
  2. Save it to a file (e.g., log.csv).
  3. Load it with the load_csv function here: https://github.com/kdavies4/ModelicaRes/blob/master/modelicares/util.py#L676 by downloading the util.py file there and doing the following:

    from util import load_csv
    d = load_csv("log.csv")
    

    You should end up with the following dictionary:

    d = {'Runner': ['A', 'B', 'C', 'A', 'B'], 'Distance': [10, 30, 4, 20, 10], 'Country': ['USA', 'UK', 'IT', 'UK', 'USA']}
    

    The load_csv function uses csv.reader to automatically create a dictionary with keys named after the entries it finds in the header row. The dictionary entry for each key is a list of the values from the corresponding column. The documentation is here: http://kdavies4.github.io/ModelicaRes/util.html#modelicares.util.load_csv. The util.py file contains a lot of other functions, but you can safely remove them (as long as you leave import numpy as np at the top).

  4. Run the following code to calculate the average distances:

    # Create counter dictionaries for the statistics.
    from collections import Counter
    n_runs = Counter()
    totals = Counter()
    
    # Calculate the total distance.
    for runner, distance in zip(d['Runner'], d['Distance']):
        n_runs[runner] += 1
        totals[runner] += distance
    
    # Print the results.
    for runner in set(d['Runner']):
        print("%s has run %i miles with an average of %g."
              % (runner, totals[runner], totals[runner]/float(n_runs[runner])))
    

This should give you the results you wanted, and it should be flexible if the data is expanded (by rows or columns).

Kevin

Upvotes: 1

JHS
JHS

Reputation: 11

I your loop use split as follows:

var1, var2, var3 = line.split(",")

This will get you each of the values in the line in each of the variables. Then use the variables as you like.

Upvotes: 1

Blender
Blender

Reputation: 298046

You can use a defaultdict to store the values and then print them out:

import csv

from collections import defaultdict

with open(filename, 'r') as handle:
    reader = csv.DictReader(handle, ['name', 'miles', 'country'])
    data = defaultdict(list)

    for line in reader:
        data[line['name']).append(int(line['miles']))

    for runner, distances in data.items():
        print '{} ran a total of {} miles and an average of {} miles'.format(
            runner, sum(distances), sum(distances) / float(len(distances))
        )

Upvotes: 6

Related Questions