Reputation: 726
I want to generate a dictionary from a list of dictionaries, grouping list items by the value of some key, such as:
input_list = [
{'a':'tata', 'b': 'foo'},
{'a':'pipo', 'b': 'titi'},
{'a':'pipo', 'b': 'toto'},
{'a':'tata', 'b': 'bar'}
]
output_dict = {
'pipo': [
{'a': 'pipo', 'b': 'titi'},
{'a': 'pipo', 'b': 'toto'}
],
'tata': [
{'a': 'tata', 'b': 'foo'},
{'a': 'tata', 'b': 'bar'}
]
}
So far I've found two ways of doing this. The first simply iterates over the list, create sublists in the dict for each key value and append elements matching these keys to the sublist:
l = [
{'a':'tata', 'b': 'foo'},
{'a':'pipo', 'b': 'titi'},
{'a':'pipo', 'b': 'toto'},
{'a':'tata', 'b': 'bar'}
]
res = {}
for e in l:
res[e['a']] = res.get(e['a'], [])
res[e['a']].append(e)
And another using itertools.groupby
:
import itertools
from operator import itemgetter
l = [
{'a':'tata', 'b': 'foo'},
{'a':'pipo', 'b': 'titi'},
{'a':'pipo', 'b': 'toto'},
{'a':'tata', 'b': 'bar'}
]
l = sorted(l, key=itemgetter('a'))
res = dict((k, list(g)) for k, g in itertools.groupby(l, key=itemgetter('a')))
I wonder which alternative is the most efficient?
Is there any more pythonic/concise or better performing way of achieving this?
Upvotes: 30
Views: 20594
Reputation: 881
The best approach is the first one you mentioned, and you can even make it more elegant by using setdefault
as mentioned by bernhard above. The complexity of this approach is O(n) since we simply iterate over the input once and for each item we perform a lookup into the output dict we are building to find the appropriate list to append it to, which takes constant time (lookup+append) for each item. So overlal complexity is O(n) which is optimal.
When using itertools.groupby, you must sort the input beforehand (which is O(n log n)).
Upvotes: 1
Reputation: 8821
Is it correct that you want to group your input list by the value of the 'a' key of the list elements? If so, your first approach is the best, one minor improvement, use dict.setdefault
:
res = {}
for item in l:
res.setdefault(item['a'], []).append(item)
Upvotes: 37
Reputation: 358
If by efficient you mean "time efficient", it is possible to measure it using the timeit
built in module.
For example:
import timeit
import itertools
from operator import itemgetter
input = [{'a': 'tata', 'b': 'foo'},
{'a': 'pipo', 'b': 'titi'},
{'a': 'pipo', 'b': 'toto'},
{'a': 'tata', 'b': 'bar'}]
def solution1():
res = {}
for e in input:
res[e['a']] = res.get(e['a'], [])
res[e['a']].append(e)
return res
def solution2():
l = sorted(input, key=itemgetter('a'))
res = dict(
(k, list(g)) for k, g in itertools.groupby(l, key=itemgetter('a'))
)
return res
t = timeit.Timer(solution1)
print(t.timeit(10000))
# 0.0122511386871
t = timeit.Timer(solution2)
print(t.timeit(10000))
# 0.0366218090057
Please refer to the timeit official docs for further information.
Upvotes: 10
Reputation: 90889
A one liner -
>>> import itertools
>>> input_list = [
... {'a':'tata', 'b': 'foo'},
... {'a':'pipo', 'b': 'titi'},
... {'a':'pipo', 'b': 'toto'},
... {'a':'tata', 'b': 'bar'}
... ]
>>> {k:[v for v in input_list if v['a'] == k] for k, val in itertools.groupby(input_list,lambda x: x['a'])}
{'tata': [{'a': 'tata', 'b': 'foo'}, {'a': 'tata', 'b': 'bar'}], 'pipo': [{'a': 'pipo', 'b': 'titi'}, {'a': 'pipo', 'b': 'toto'}]}
Upvotes: 7