Reputation: 1821
I have a list with ids called ids
. Every element in ids
is a string. One id
can exist multiple times in this list.
My aim is to create a dictionary which has the the number of occurrences as a key and the value is a list of the ids which appear that often. My current approach looks like this:
from collections import defaultdict
import numpy as np
ids = ["foo", "foo", "bar", "hi", "hi"]
counts = defaultdict(list)
for id in np.unique(ids):
counts[ids.count(id)].append(id)
Output:
print counts
--> defaultdict(<type 'list'>, {1: ['bar'], 2: ['foo', 'hi']})
This works nicely if the list of ids is not too long. However, for longer lists the performance is rather bad.
How can I make this faster?
Upvotes: 3
Views: 169
Reputation: 82899
Instead of calling count
for each element in the list, create a collections.Counter
for the entire list:
ids = ["foo", "foo", "bar", "hi", "hi"]
counts = defaultdict(list)
for i, c in Counter(ids).items():
counts[c].append(i)
# counts: defaultdict(<class 'list'>, {1: ['bar'], 2: ['foo', 'hi']})
If you prefer a one-liner, you could also combine Counter.most_common
(for view on the elements sorted by counts) and itertools.groupby
(but I rather wouldn't)
>>> {k: [v[0] for v in g] for k, g in groupby(Counter(ids).most_common(), lambda x: x[1])}
{1: ['bar'], 2: ['foo', 'hi']}
Upvotes: 4