Reputation: 5437
I'm looking for a more pythonic way to get the maximal length of the values for each key in a list of dictionaries.
My approach looks like this
lst =[{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct = {}
for l in lst:
for key in l:
dct.update({key: max(dct.get(key,0), len(str(l.get(key,0))))})
print(dct)
The output gives
{'b': 6, 'a': 11}
The str function is needed to get the length of integers (and also Nones)
Is this approach "pythonic" or is there a smoother, more readable way using list comprehensions or similar methods.
Upvotes: 3
Views: 225
Reputation: 1557
I like this take for readability and use of Python as such:
dicts = [{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
def get_highest(current_highest, items_left):
if not items_left:
return current_highest
else:
item = items_left.pop()
higher = {key: len(str(value)) for key, value in item.items() if (len(str(item[key])) > current_highest.get(key, 0))}
if higher:
current_highest.update(higher)
return get_highest(current_highest, items_left)
print(get_highest(dict(), dicts))
{'b': 6, 'a': 11}
Upvotes: 1
Reputation: 11814
I think your approach is fairly Pythonic except that I would change the update
line to be a little more clear:
# A little terse
dct.update({key: max(dct.get(key,0), len(str(l.get(key,0))))})
# A little simpler
dct[key] = max(dct.get(key, 0), len(str(l[key])))
Here's a solution with variable names modified as well:
dict_list =[{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
max_lengths = {}
for dictionary in dict_list:
for k, v in dictionary.items():
max_lengths[k] = max(max_lengths.get(k, 0), len(str(v)))
print(max_lengths)
Upvotes: 3
Reputation: 2700
My previous answer was wrong and did not realize but here are two others that do work. The first one uses pandas. It creates a dataframe, sorts the keys then the values, takes the first value of each group, and then creates a dictionary out of that
import pandas as pd
lst = [{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct={}
d = pd.DataFrame([(k,len(str(v))) for i in lst for k,v in i.items()], columns=['Key','Value'])
d = d.sort(['Key','Value'], ascending=[1,0])
d = d.groupby('Key').first().reset_index()
d = dict(zip(d.Key, d.Value)) #or d.set_index('Key')['Value'].to_dict()
print d
{'a': 11, 'b': 6}
if you want something that is easily readable and uses the built-in modules then this should do
lst = [{'a':'asdasd', 'b': 123},{'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct={}
for i in lst:
for k,v in i.items():
if k in dct:
if len(str(v)) > dct[k]:
dct[k] = len(str(v))
else:
dct[k] = len(str(v))
print dct
{'a': 11, 'b': 6}
Upvotes: 1
Reputation: 165
The other answers focus on using python features rather than readability. Personally I'm of the opinion that readability and simplicity are the most important of all the 'pythonic' traits.
(I simplified to use strings for everything, but it would work with integers as well if you drop in a str()
)
from collections import defaultdict
lst =[{'a':'asdasd', 'b': '123'},{'b': 'asdasdasdas'}, {'a':'123','b':'asdasd'}]
def merge_dict(dic1,dic2) :
for key,value in dic2.items():
dic1[key].append(value)
combined = defaultdict(list)
for dic in lst:
merge_dict(combined, dic)
print( {key : max(map(len,value)) for key, value in combined.items() } )
Upvotes: 1
Reputation: 5780
Here's another way that doesn't rely on sorting/zipping but I wouldn't say one is more Pythonic than the other.
from itertools import chain
lst =[{'a':'asdasd', 'b': 123}, {'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct = {
k: max(len(str(d.get(k, ""))) for d in lst)
for k in set(chain.from_iterable(d.keys() for d in lst))
}
print(dct)
Alternatively, you can use groupby:
from itertools import chain, groupby
lst =[{'a':'asdasd', 'b': 123}, {'a': 'asdasdasdas'}, {'a':123,'b':'asdasd'}]
dct = {
k: max(len(str(v)) for _, v in g)
for k, g in groupby(
chain.from_iterable(d.items() for d in lst),
lambda p: p[0]
)
}
print(dct)
Upvotes: 1