How do I compose a frequency list from unique arrays of different length in numpy

Question

I have a list of numpy arrays of different lengths, some of which repeat, like so:

import numpy as np

multi = [np.array([1, 2, 3]),
      np.array([1, 2]),
      np.array([1, 2, 3, 4]),
      np.array([1, 2, 3]),
      np.array([1, 2])]

From this list, I want a count of the unique arrays (like a histogram over the sequences).

Since numpy arrays are not hashable, I am doing this by converting the arrays to their string representation and using that as a key for grouping with itertools.groupby similar to this method,

import itertools

sorted_strings = sorted([str(p) for p in multi])
groups = [(k, len(list(g))) for k, g in itertools.groupby(sorted_strings)]
print(groups)

The output for this is:

[('[1 2 3 4]', 1), ('[1 2 3]', 2), ('[1 2]', 2)]

This is correct, but I'm wondering if there is a more elegant solution, or if there is a better way to store this data than in a list of arrays.

K Z · Accepted Answer

You can use collections.Counter:

>>> from collections import Counter
>>> 
>>> Counter(map(tuple, multi)).most_common()
[((1, 2), 2), ((1, 2, 3), 2), ((1, 2, 3, 4), 1)]

To get least common:

>>> Counter(map(tuple, multi)).most_common()[::-1]
[((1, 2, 3, 4), 1), ((1, 2, 3), 2), ((1, 2), 2)]

How do I compose a frequency list from unique arrays of different length in numpy

Answers (2)

Related Questions