Reputation: 363
I want to find the mean price of an item in a dictionary that have pairs of item,shop as key and the price as value
example dictionary
{('item1', 'shop1'): 40,
('item2', 'shop2'): 14,
('item1', 'shop3'): 55,
for example i want to find the mean price of item1. Is it possible with a multikey dictionary or should i change it? Any ideas?
Thanks
Upvotes: 1
Views: 92
Reputation: 1033
I would solve this using pandas DataFrame:
# create a test dict like the question
my_dict = dict(zip([
('item'+str(i), 'shop'+str(k)) for i in range(5) for k in range(3)],
[random.randint(1,10) for j in range(15)
]))
# create a DataFrame wih MultiIndex
ndx=pd.MultiIndex.from_tuples(list(my_dict.keys()), names=['item','shop'])
df = pd.DataFrame(list(my_dict.values()), index=ndx, columns=['price'])
print('\n', df)
# reset index and use groupby to get means
df.reset_index(inplace=True)
item_mean = df.groupby('item').mean()
print('\n',item_mean)
price
item shop
item3 shop0 5
shop2 3
item1 shop0 4
item3 shop1 7
item4 shop0 7
item0 shop0 10
item2 shop1 3
shop0 2
item1 shop1 10
item4 shop2 5
shop1 3
item1 shop2 2
item0 shop1 1
shop2 8
item2 shop2 7
price
item
item0 6.333333
item1 5.333333
item2 4.000000
item3 5.000000
item4 5.000000
Upvotes: 0
Reputation: 1323
It is possible. Not sure if this is the right data structure to your problem but you can do it like this.
First you select all the keys with the item you want, here I'm selecting 'item1'
:
interesting_keys = filter(lambda k: k[0] == 'item1', a.keys())
Now you can sum all those elements and divide by the number of elements.
result = sum([a[k] for k in interesting_keys])/len(interesting_keys)
If you want to create a new dictionary reduced to one element per key followed by the mean, you may do something that looks like this:
def group_prices(prices):
grouped_prices = {}
number_items = {}
for k, v in prices.iteritems():
grouped_prices[k[0]] = grouped_prices.get(k[0], 0) + v
number_items[k[0]] = number_items.get(k[0], 0) + 1
return {k:v/number_items[k] for (k,v) in grouped_prices.iteritems()}
Upvotes: 1
Reputation: 375715
Since this is labelled pandas... If you make this a pandas Series you can groupby the 0th level:
In [11]: d = {('item1', 'shop1'): 40, ('item2', 'shop2'): 14,('item1', 'shop3'): 55}
In [12]: s = pd.Series(d)
In [13]: s
Out[13]:
item1 shop1 40
shop3 55
item2 shop2 14
dtype: int64
In [14]: s.groupby(level=0).mean()
Out[14]:
item1 47.5
item2 14.0
dtype: float64
Upvotes: 1
Reputation: 6581
You can create a Pandas DataFrame using nested lists
. You can then use Pandas groupby
to get the mean
you're looking for.
import pandas as pd
df = pd.DataFrame([['item1', 'shop1', 40],
['item2', 'shop2', 14],
['item1', 'shop3', 55]], columns=('item', 'shop', 'price'))
df
item shop price
0 item1 shop1 40
1 item2 shop2 14
2 item1 shop3 55
result_mean = df.groupby('item')['price'].mean()
result_mean
item
item1 47.5
item2 14.0
Name: price, dtype: float64
Upvotes: 1