Reputation: 3923

Averaging the values in a dictionary based on the key

I am new to Python and I have a set of values like the following:

(3, '655')
(3, '645')
(3, '641')
(4, '602')
(4, '674')
(4, '620')

This is generated from a CSV file with the following code (python 2.6):

import csv
import time

with open('file.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        date = time.strptime(row[3], "%a %b %d %H:%M:%S %Z %Y")
        data = date, row[5]

        month = data[0][1]
        avg = data[1]
        monthAvg = month, avg
        print monthAvg

What I would like to do is get an average of the values based on the keys:

(3, 647)
(4, 632)

My initial thought was to create a new dictionary.

loop through the original dictionary
    if the key does not exist
        add the key and value to the new dictionary
    else
        sum the value to the existing value in the new dictionary

I'd also have to keep a count of the number of keys so I could produce the average. Seems like a lot of work though - I wasn't sure if there was a more elegant way to accomplish this.

Thank you.

Upvotes: 4

Answers (4)

Kasravnd

Reputation: 107287

You can use collections.defaultdict to create a dictionary with unique keys and lists of values:

>>> l=[(3, '655'),(3, '645'),(3, '641'),(4, '602'),(4, '674'),(4, '620')]
>>> from collections import defaultdict
>>> d=defaultdict(list)
>>> 
>>> for i,j in l:
...    d[i].append(int(j))
... 
>>> d
defaultdict(<type 'list'>, {3: [655, 645, 641], 4: [602, 674, 620]})

Then use a list comprehension to create the expected pairs:

>>> [(i,sum(j)/len(j)) for i,j in d.items()]
[(3, 647), (4, 632)]

And within your code you can do:

with open('file.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        date = time.strptime(row[3], "%a %b %d %H:%M:%S %Z %Y")
        data = date, row[5]

        month = data[0][1]
        avg = data[1]
        d[month].append(int(avg))

     print [(i,sum(j)/len(j)) for i,j in d.items()]

Upvotes: 4

TheBlackCat

Reputation: 10298

Use pandas, it is designed specifically to do these sorts of things, meaning you can express them in only a small amount of code (what you want to do is a one-liner). Further, it will be much, much faster than any of the other approaches when given a lot of values.

import pandas as pd

a=[(3, '655'),
   (3, '645'),
   (3, '641'),
   (4, '602'),
   (4, '674'),
   (4, '620')]

res = pd.DataFrame(a).astype('float').groupby(0).mean()
print(res)

Gives:

Here is a multi-line version, showing what happens:

df = pd.DataFrame(a)  # construct a structure containing data
df = df.astype('float')  # convert data to float values
grp = df.groupby(0)  # group the values by the value in the first column
df = grp.mean()  # take the mean of each group

Further, if you want to use a csv file, it is even easier since you don't need to parse the csv file yourself (I use made-up names for the columns I don't know):

import pandas as pd
df = pd.read_csv('file.csv', columns=['col0', 'col1', 'col2', 'date', 'col4', 'data'], index=False, header=None)
df['month'] = pd.DatetimeIndex(df['date']).month
df = df.loc[:,('month', 'data')].groupby('month').mean()

Upvotes: 2

Joran Beasley

Reputation: 113940

import itertools,csv
from dateutil.parser import parse as dparse

def make_tuples(fname='file.csv'):
    with open(fname, 'rb') as csvfile:
        rows = list(csv.reader(csvfile))
        for month,data in itertools.groupby(rows,lambda x:dparse(x[3]).strftime("%b")):
             data = zip(*data)
             yield (month,sum(data[5])/float(len(data[5])))

print dict(make_tuples('some_csv.csv'))

is one way to do it ...

Upvotes: 1

Malik Brahimi

Reputation: 16711

Use a dictionary comprehension, where items in the list of tuple pairs:

data = {i:[int(b) for a, b in items if a == i] for i in set(a for a, b in items)}
data = {a:int(float(sum(b))/float(len(b))) for a, b in data.items()} # averages

Upvotes: 1

Averaging the values in a dictionary based on the key

Answers (4)

Related Questions