Reputation: 963
I have a list of numpy arrays of different lengths, some of which repeat, like so:
import numpy as np
multi = [np.array([1, 2, 3]),
np.array([1, 2]),
np.array([1, 2, 3, 4]),
np.array([1, 2, 3]),
np.array([1, 2])]
From this list, I want a count of the unique arrays (like a histogram over the sequences).
Since numpy arrays are not hashable, I am doing this by converting the arrays to their string representation and using that as a key for grouping with itertools.groupby
similar to this method,
import itertools
sorted_strings = sorted([str(p) for p in multi])
groups = [(k, len(list(g))) for k, g in itertools.groupby(sorted_strings)]
print(groups)
The output for this is:
[('[1 2 3 4]', 1), ('[1 2 3]', 2), ('[1 2]', 2)]
This is correct, but I'm wondering if there is a more elegant solution, or if there is a better way to store this data than in a list of arrays.
Upvotes: 1
Views: 232
Reputation: 20339
If you're stuck with a version of Python that doesn't define collections.Counter
, you could use the method you linked to:
base = sorted(tuple(m) for m in multi)
G=[(k,len(list(g))) for (k,g) in itertools.groupby(base)]
You'd basically transform each array into a tuple (note that the Counter
-based method relies on the same approach).
Note that you may want to make sure your arrays are sorted, so that np.array([2,1])
and np.array([1,2])
are considered equivalent:
base = sorted(tuple(sorted(m)) for m in multi)
Upvotes: 0
Reputation: 30453
You can use collections.Counter
:
>>> from collections import Counter
>>>
>>> Counter(map(tuple, multi)).most_common()
[((1, 2), 2), ((1, 2, 3), 2), ((1, 2, 3, 4), 1)]
To get least common:
>>> Counter(map(tuple, multi)).most_common()[::-1]
[((1, 2, 3, 4), 1), ((1, 2, 3), 2), ((1, 2), 2)]
Upvotes: 2