Merlin1896
Merlin1896

Reputation: 1821

Optimize creation of dictionary

I have a list with ids called ids. Every element in ids is a string. One id can exist multiple times in this list.

My aim is to create a dictionary which has the the number of occurrences as a key and the value is a list of the ids which appear that often. My current approach looks like this:

from collections import defaultdict
import numpy as np
ids = ["foo", "foo", "bar", "hi", "hi"]
counts = defaultdict(list)
for id in np.unique(ids):
    counts[ids.count(id)].append(id)

Output:

print counts
--> defaultdict(<type 'list'>, {1: ['bar'], 2: ['foo', 'hi']})

This works nicely if the list of ids is not too long. However, for longer lists the performance is rather bad.

How can I make this faster?

Upvotes: 3

Views: 169

Answers (1)

tobias_k
tobias_k

Reputation: 82899

Instead of calling count for each element in the list, create a collections.Counter for the entire list:

ids = ["foo", "foo", "bar", "hi", "hi"]
counts = defaultdict(list)
for i, c in Counter(ids).items():
    counts[c].append(i)
# counts: defaultdict(<class 'list'>, {1: ['bar'], 2: ['foo', 'hi']})

If you prefer a one-liner, you could also combine Counter.most_common (for view on the elements sorted by counts) and itertools.groupby (but I rather wouldn't)

>>> {k: [v[0] for v in g] for k, g in groupby(Counter(ids).most_common(), lambda x: x[1])}
{1: ['bar'], 2: ['foo', 'hi']}

Upvotes: 4

Related Questions