Reputation: 3923
I am new to Python and I have a set of values like the following:
(3, '655')
(3, '645')
(3, '641')
(4, '602')
(4, '674')
(4, '620')
This is generated from a CSV file with the following code (python 2.6):
import csv
import time
with open('file.csv', 'rb') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
date = time.strptime(row[3], "%a %b %d %H:%M:%S %Z %Y")
data = date, row[5]
month = data[0][1]
avg = data[1]
monthAvg = month, avg
print monthAvg
What I would like to do is get an average of the values based on the keys:
(3, 647)
(4, 632)
My initial thought was to create a new dictionary.
loop through the original dictionary
if the key does not exist
add the key and value to the new dictionary
else
sum the value to the existing value in the new dictionary
I'd also have to keep a count of the number of keys so I could produce the average. Seems like a lot of work though - I wasn't sure if there was a more elegant way to accomplish this.
Thank you.
Upvotes: 4
Views: 205
Reputation: 107287
You can use collections.defaultdict
to create a dictionary with unique keys and lists of values:
>>> l=[(3, '655'),(3, '645'),(3, '641'),(4, '602'),(4, '674'),(4, '620')]
>>> from collections import defaultdict
>>> d=defaultdict(list)
>>>
>>> for i,j in l:
... d[i].append(int(j))
...
>>> d
defaultdict(<type 'list'>, {3: [655, 645, 641], 4: [602, 674, 620]})
Then use a list comprehension to create the expected pairs:
>>> [(i,sum(j)/len(j)) for i,j in d.items()]
[(3, 647), (4, 632)]
And within your code you can do:
with open('file.csv', 'rb') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
date = time.strptime(row[3], "%a %b %d %H:%M:%S %Z %Y")
data = date, row[5]
month = data[0][1]
avg = data[1]
d[month].append(int(avg))
print [(i,sum(j)/len(j)) for i,j in d.items()]
Upvotes: 4
Reputation: 10298
Use pandas
, it is designed specifically to do these sorts of things, meaning you can express them in only a small amount of code (what you want to do is a one-liner). Further, it will be much, much faster than any of the other approaches when given a lot of values.
import pandas as pd
a=[(3, '655'),
(3, '645'),
(3, '641'),
(4, '602'),
(4, '674'),
(4, '620')]
res = pd.DataFrame(a).astype('float').groupby(0).mean()
print(res)
Gives:
1
0
3 647
4 632
Here is a multi-line version, showing what happens:
df = pd.DataFrame(a) # construct a structure containing data
df = df.astype('float') # convert data to float values
grp = df.groupby(0) # group the values by the value in the first column
df = grp.mean() # take the mean of each group
Further, if you want to use a csv
file, it is even easier since you don't need to parse the csv
file yourself (I use made-up names for the columns I don't know):
import pandas as pd
df = pd.read_csv('file.csv', columns=['col0', 'col1', 'col2', 'date', 'col4', 'data'], index=False, header=None)
df['month'] = pd.DatetimeIndex(df['date']).month
df = df.loc[:,('month', 'data')].groupby('month').mean()
Upvotes: 2
Reputation: 113940
import itertools,csv
from dateutil.parser import parse as dparse
def make_tuples(fname='file.csv'):
with open(fname, 'rb') as csvfile:
rows = list(csv.reader(csvfile))
for month,data in itertools.groupby(rows,lambda x:dparse(x[3]).strftime("%b")):
data = zip(*data)
yield (month,sum(data[5])/float(len(data[5])))
print dict(make_tuples('some_csv.csv'))
is one way to do it ...
Upvotes: 1
Reputation: 16711
Use a dictionary comprehension, where items
in the list of tuple pairs:
data = {i:[int(b) for a, b in items if a == i] for i in set(a for a, b in items)}
data = {a:int(float(sum(b))/float(len(b))) for a, b in data.items()} # averages
Upvotes: 1