user1248490
user1248490

Reputation: 963

How do I compose a frequency list from unique arrays of different length in numpy

I have a list of numpy arrays of different lengths, some of which repeat, like so:

import numpy as np

multi = [np.array([1, 2, 3]),
      np.array([1, 2]),
      np.array([1, 2, 3, 4]),
      np.array([1, 2, 3]),
      np.array([1, 2])]

From this list, I want a count of the unique arrays (like a histogram over the sequences).

Since numpy arrays are not hashable, I am doing this by converting the arrays to their string representation and using that as a key for grouping with itertools.groupby similar to this method,

import itertools

sorted_strings = sorted([str(p) for p in multi])
groups = [(k, len(list(g))) for k, g in itertools.groupby(sorted_strings)]
print(groups)

The output for this is:

[('[1 2 3 4]', 1), ('[1 2 3]', 2), ('[1 2]', 2)]

This is correct, but I'm wondering if there is a more elegant solution, or if there is a better way to store this data than in a list of arrays.

Upvotes: 1

Views: 232

Answers (2)

Pierre GM
Pierre GM

Reputation: 20339

If you're stuck with a version of Python that doesn't define collections.Counter, you could use the method you linked to:

 base = sorted(tuple(m) for m in multi)
 G=[(k,len(list(g))) for (k,g) in itertools.groupby(base)]

You'd basically transform each array into a tuple (note that the Counter-based method relies on the same approach).

Note that you may want to make sure your arrays are sorted, so that np.array([2,1]) and np.array([1,2]) are considered equivalent:

 base = sorted(tuple(sorted(m)) for m in multi)

Upvotes: 0

K Z
K Z

Reputation: 30453

You can use collections.Counter:

>>> from collections import Counter
>>> 
>>> Counter(map(tuple, multi)).most_common()
[((1, 2), 2), ((1, 2, 3), 2), ((1, 2, 3, 4), 1)]

To get least common:

>>> Counter(map(tuple, multi)).most_common()[::-1]
[((1, 2, 3, 4), 1), ((1, 2, 3), 2), ((1, 2), 2)]

Upvotes: 2

Related Questions