Quickbeam2k1
Quickbeam2k1

Reputation: 5437

From a list of dicts get the maximal length of the values for each key in a pythonic way

I'm looking for a more pythonic way to get the maximal length of the values for each key in a list of dictionaries.

My approach looks like this

lst =[{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct = {}
for l in lst:
    for key in l:
        dct.update({key: max(dct.get(key,0), len(str(l.get(key,0))))})
print(dct)

The output gives

{'b': 6, 'a': 11}

The str function is needed to get the length of integers (and also Nones)

Is this approach "pythonic" or is there a smoother, more readable way using list comprehensions or similar methods.

Upvotes: 3

Views: 225

Answers (5)

JoGr
JoGr

Reputation: 1557

I like this take for readability and use of Python as such:

dicts = [{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]

def get_highest(current_highest, items_left):
    if not items_left:
        return current_highest
    else:
        item = items_left.pop()
        higher = {key: len(str(value)) for key, value in item.items() if (len(str(item[key])) > current_highest.get(key, 0))}
    if higher:
        current_highest.update(higher)
    return get_highest(current_highest, items_left)

print(get_highest(dict(), dicts))

{'b': 6, 'a': 11}

Upvotes: 1

Trey Hunner
Trey Hunner

Reputation: 11814

I think your approach is fairly Pythonic except that I would change the update line to be a little more clear:

# A little terse
dct.update({key: max(dct.get(key,0), len(str(l.get(key,0))))})
# A little simpler
dct[key] = max(dct.get(key, 0), len(str(l[key])))

Here's a solution with variable names modified as well:

dict_list =[{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
max_lengths = {}
for dictionary in dict_list:
    for k, v in dictionary.items():
        max_lengths[k] = max(max_lengths.get(k, 0), len(str(v)))
print(max_lengths)

Upvotes: 3

SirParselot
SirParselot

Reputation: 2700

My previous answer was wrong and did not realize but here are two others that do work. The first one uses pandas. It creates a dataframe, sorts the keys then the values, takes the first value of each group, and then creates a dictionary out of that

import pandas as pd
lst = [{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct={}

d = pd.DataFrame([(k,len(str(v))) for i in lst for k,v in i.items()], columns=['Key','Value'])
d = d.sort(['Key','Value'], ascending=[1,0])
d = d.groupby('Key').first().reset_index()
d = dict(zip(d.Key, d.Value))  #or d.set_index('Key')['Value'].to_dict()
print d

{'a': 11, 'b': 6}

if you want something that is easily readable and uses the built-in modules then this should do

lst = [{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct={}

for i in lst:
    for k,v in i.items():
        if k in dct:
            if len(str(v)) > dct[k]:
                dct[k] = len(str(v))
        else:
            dct[k] = len(str(v))
print dct

{'a': 11, 'b': 6}

Upvotes: 1

James Harrison
James Harrison

Reputation: 165

The other answers focus on using python features rather than readability. Personally I'm of the opinion that readability and simplicity are the most important of all the 'pythonic' traits.

(I simplified to use strings for everything, but it would work with integers as well if you drop in a str())

from collections import defaultdict

lst =[{'a':'asdasd', 'b': '123'},{'b': 'asdasdasdas'}, {'a':'123','b':'asdasd'}]

def merge_dict(dic1,dic2) :
    for key,value in dic2.items():
            dic1[key].append(value)

combined = defaultdict(list)
for dic in lst:
    merge_dict(combined, dic)

print( {key : max(map(len,value)) for key, value in combined.items() } )

Upvotes: 1

Steven
Steven

Reputation: 5780

Here's another way that doesn't rely on sorting/zipping but I wouldn't say one is more Pythonic than the other.

from itertools import chain

lst =[{'a':'asdasd', 'b': 123}, {'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct = {
    k: max(len(str(d.get(k, ""))) for d in lst)
    for k in set(chain.from_iterable(d.keys() for d in lst))
}

print(dct)

Alternatively, you can use groupby:

from itertools import chain, groupby

lst =[{'a':'asdasd', 'b': 123}, {'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct = {
    k: max(len(str(v)) for _, v in g)
    for k, g in groupby(
        chain.from_iterable(d.items() for d in lst),
        lambda p: p[0]
    )
}

print(dct)

Upvotes: 1

Related Questions