Reputation: 9243
I would like to create a new list of values, my_qty
where each item is equal to the average of all values in d[key]['qty']
where d[key]['start date']
matches a value in my_dates
. I think I am close, but am getting hung up on the nested portion.
import datetime
import numpy as np
my_dates = [datetime.datetime(2014, 10, 12, 0, 0), datetime.datetime(2014, 10, 13, 0, 0), datetime.datetime(2014, 10, 14, 0, 0)]
d = {
'ID1' : {'start date': datetime.datetime(2014, 10, 12, 0, 0) , 'qty': 12},
'ID2' : {'start date': datetime.datetime(2014, 10, 13, 0, 0) , 'qty': 34},
'ID3' : {'start date': datetime.datetime(2014, 10, 12, 0, 0) , 'qty': 35},
'ID4' : {'start date': datetime.datetime(2014, 10, 11, 0, 0) , 'qty': 40},
}
my_qty = []
for item in my_dates:
my_qty.append([np.mean(x for x in d[key]['qty']) if d[key]['start date'] == my_dates[item]])
print my_qty
Desired Output:
[23.5,34,0]
To clarify the output per request:
[average of d[key]['qty'] where d[key]['start date '] == my_dates[0], average of d[key]['qty'] where d[key]['start date '] == my_dates[1], average of d[key]['qty'] where d[key]['start date '] == my_dates[2],]
Upvotes: 1
Views: 2949
Reputation: 3191
The one line answer:
mean_qty = [np.mean([i['qty'] for i in d.values()\
if i.get('start date') == day] or 0) for day in my_dates]
In [12]: mean_qty
Out[12]: [23.5, 34.0, 0.0]
The purpose of or 0
is to return 0 as the OP wanted if there are no qty
since np.mean on an empty list returns nan
by default.
If you need speed, then building on jme's excellent second part, you can do this (I cut his time down by 3x by not recalculating the mean until it's called for):
class RunningMean(object):
def __init__(self, total=0.0, n=0):
self.total=total
self.n = n
def __iadd__(self, other):
self.total += other
self.n += 1
return self
def mean(self):
return (self.total/self.n if self.n else 0)
def __repr__(self):
return "RunningMean(total=%f, n=%i)" %(self.total, self.n)
means = defaultdict(RunningMean)
for v in d.values():
means[v["start date"]] += (v["qty"])
Out[351]:
[RunningMean(mean= 40.000000),
RunningMean(mean= 34.000000),
RunningMean(mean= 23.500000)]
Upvotes: 2
Reputation: 20755
The simple way is to group the quantities by date into a dictionary:
import collections
quantities = collections.defaultdict(lambda: [])
for k,v in d.iteritems():
quantities[v["start date"]].append(v["qty"])
Then run over that dictionary to compute the means:
means = {k: float(sum(q))/len(q) for k,q in quantities.iteritems()}
Giving:
>>> means
{datetime.datetime(2014, 10, 11, 0, 0): 40.0,
datetime.datetime(2014, 10, 12, 0, 0): 23.5,
datetime.datetime(2014, 10, 13, 0, 0): 34.0}
If you wanted to be clever, it's possible to compute the mean in a single pass by keeping the current mean and the tally of the number of values you've seen. You can even abstract this in a class:
class RunningMean(object):
def __init__(self, mean=None, n=0):
self.mean = mean
self.n = n
def insert(self, other):
if self.mean is None:
self.mean = 0.0
self.mean = (self.mean * self.n + other) / (self.n + 1)
self.n += 1
def __repr__(self):
args = (self.__class__.__name__, self.mean, self.n)
return "{}(mean={}, n={})".format(*args)
And one pass through your data will give you your answer:
import collections
means = collections.defaultdict(lambda: RunningMean())
for k,v in d.iteritems():
means[v["start date"]].insert(v["qty"])
The really simple way is to use the pandas
library, as it was made for things like this. Here's some code:
import pandas as pd
df = pd.DataFrame.from_dict(d, orient="index")
means = df.groupby("start date").aggregate(np.mean)
Giving:
>>> means
qty
start date
2014-10-11 40.0
2014-10-12 23.5
2014-10-13 34.0
Upvotes: 6
Reputation: 52059
Here is some working code which should help you:
for item in my_dates:
nums = [ d[key]['qty'] for key in d if d[key]['start date'] == item ]
if len(nums):
avg = np.mean(nums)
else:
avg = 0
print item, nums, avg
Note that np.mean
doesn't work on an empty list, so you have to check the length of the numbers you want to average.
Upvotes: 1