pythonuser23
pythonuser23

Reputation: 21

Dictionary comprehension to get average per keyed item from a list of tuples with repeated key values

mylist = [(0.8132195134810816, 'A'), (0.79314903781799, 'B'), (0.3931539216409497, 'A'), (0.23487952756579994, 'B'), (0.06686513021322447, 'C'), (0.008103227303653366, 'C'), (0.007403104126575008, 'D'), (-0.0041128367759631496, 'D'), (-0.005739579154553378, 'D'), (-0.008074572907817046, 'B')]

#I've tried a few conversions. Note, I can do this with a for loop. I am looking to know if #there's a way to do this with a dictionary comprehension. Of course, I can build a regular #dictionary, but was hoping for a series of filter one-liners.

newdict = dict()
for symbol in ['A','B','C','D']:  # semesters        
    values = [item for item, symbol_item in mylist if symbol_item == symbol]
    print (symbol, sum(values)/len(values))
    newdict[symbol] = sum(values)/len(values)

#I am hoping there is a way to do not list of the symbols

#Tried default dictionary to make value of key into a list, but that didn't work.

mydict = defaultdict(list)
mydict.update({key: (mydict[key] + [value]) for value,key in my list})

Upvotes: 2

Views: 42

Answers (2)

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 96257

You can do this but it's always going to be ugly. In in Python 3.8+, you can use an assignment expression to assign to values:

>>> mylist = [(0.8132195134810816, 'A'), (0.79314903781799, 'B'), (0.3931539216409497, 'A'), (0.23487952756579994, 'B'), (0.06686513021322447, 'C'), (0.008103227303653366, 'C'), (0.007403104126575008, 'D'), (-0.0041128367759631496, 'D'), (-0.005739579154553378, 'D'), (-0.008074572907817046, 'B')]
>>> result = {
...     symbol : sum((values:= [item for item, symbol_item in mylist if symbol_item == symbol])) / len(values)
...     for symbol in ['A','B','C','D']
... }
>>> result
{'A': 0.6031867175610156, 'B': 0.33998466415865763, 'C': 0.03748417875843892, 'D': -0.0008164372679805064}

But this is really confusing an unreadable. You shouldn't be striving to cram your code into one-liners, that's bad. You should instead try to write readable, efficient, and maintainable code.

Comprehension constructs sometimes make your code more readable, that is their main advantage, if that isn't the case like here, then you shouldn't use it.

Note, without assignment expressions, you'd have to rely on another for-clause to assign to values:

>>> result = {
...     symbol : sum(values) / len(values)
...     for symbol in ['A','B','C','D']
...     for values in ([item for item, symbol_item in mylist if symbol_item == symbol],)
... }
>>> result
{'A': 0.6031867175610156, 'B': 0.33998466415865763, 'C': 0.03748417875843892, 'D': -0.0008164372679805064}

But really, that adds no clarity compared to a regular for-loop.

You could also iterate over:

[item for item, symbol_item in mylist if symbol_item == symbol]

Twice, once to get the sum, and again to get the length, but I won't even write out that insanity.

Now, the best way to do this IMO is to use the grouping idiom, and your code stays linear time, and you don't even need to know the symbols ahead of time:

>>> from collections import defaultdict
>>> result = defaultdict(list)
>>> for value, symbol in mylist:
...     result[symbol].append(value)
...
>>> result = {symbol: sum(values)/len(values) for symbol, values in result.items()}
>>> result
{'A': 0.6031867175610156, 'B': 0.33998466415865763, 'C': 0.03748417875843892, 'D': -0.0008164372679805064}

Upvotes: 0

Jab
Jab

Reputation: 27515

You can use itertools.groupby and statistics.mean just make sure the input is sorted by the letters and here I used operator.itemgetter to get the numbers and letters on the fly:

from itertools import groupby
from statistics import mean
from operator import itemgetter

mylist = [(0.8132195134810816, 'A'), (0.79314903781799, 'B'), (0.3931539216409497, 'A'), (0.23487952756579994, 'B'), (0.06686513021322447, 'C'), (0.008103227303653366, 'C'), (0.007403104126575008, 'D'), (-0.0041128367759631496, 'D'), (-0.005739579154553378, 'D'), (-0.008074572907817046, 'B')]

get_key = itemgetter(1)
get_value = itemgetter(0)
sorted_list = sorted(mylist, key=get_key)

newdict = {k: mean(map(get_value, g)) for k, g in groupby(sorted_list, get_key)}

print(newdict)

{'A': 0.6031867175610156, 'B': 0.33998466415865763, 'C': 0.03748417875843892, 'D': -0.0008164372679805064}

Upvotes: 2

Related Questions