Reputation: 63
I have many 2D arrays 1161 x 1161 composed of 0,1,2,3 numbers. for instance one of them is composed in the following way:
521859 zeros , 288972 ones , 481471 twos , 55619 threes.
I would like to find the fastest way to obtain the same array but where now the smallest number of occurrences are the zeros , the second number of occurences are ones and so on, giving the same array but now composed of:
55619 zeros, 288972 ones, 481471 twos , 521859 threes
If there is a very pythonic way it would be great of course
Thanks in advance for any help!
Upvotes: 0
Views: 212
Reputation: 61910
You could use np.unique to get the unique elements and the counts, then build a dictionary where the keys are the old values and values the new. Finally apply it to the whole array using np.vectorize:
import numpy as np
from operator import itemgetter
arr = np.array([2, 2, 0, 0, 0, 1, 3, 3, 3, 3])
# get unique elements and counts
counts = zip(*np.unique(arr, return_counts=True))
# create a lookup dictionary value -> i where values are sorted according to frequency
mapping = {value: i for i, (value, _) in enumerate(sorted(counts, key=itemgetter(1)))}
# apply the dictionary in a vectorized way
result = np.vectorize(mapping.get)(arr)
print(result)
Output
[1 1 2 2 2 0 3 3 3 3]
A, perhaps cleaner, alternative is to use collections.Counter, to count and create the mapping dictionary:
# get unique elements and counts
counts = Counter(arr)
# create a lookup dictionary value -> i where values are sorted according to frequency
mapping = {value: i for i, value in enumerate(sorted(counts, key=counts.get))}
Upvotes: 1