DiMa
DiMa

Reputation: 63

2D numpy arrays replacing values according to occurrence

I have many 2D arrays 1161 x 1161 composed of 0,1,2,3 numbers. for instance one of them is composed in the following way:

521859 zeros , 288972 ones , 481471 twos , 55619 threes.

I would like to find the fastest way to obtain the same array but where now the smallest number of occurrences are the zeros , the second number of occurences are ones and so on, giving the same array but now composed of:

55619 zeros, 288972 ones, 481471 twos , 521859 threes

If there is a very pythonic way it would be great of course

Thanks in advance for any help!

Upvotes: 0

Views: 212

Answers (1)

Dani Mesejo
Dani Mesejo

Reputation: 61910

You could use np.unique to get the unique elements and the counts, then build a dictionary where the keys are the old values and values the new. Finally apply it to the whole array using np.vectorize:

import numpy as np
from operator import itemgetter

arr = np.array([2, 2, 0, 0, 0, 1, 3, 3, 3, 3])

# get unique elements and counts
counts = zip(*np.unique(arr, return_counts=True))

# create a lookup dictionary value -> i where values are sorted according to frequency
mapping = {value: i for i, (value, _) in enumerate(sorted(counts, key=itemgetter(1)))}

# apply the dictionary in a vectorized way
result = np.vectorize(mapping.get)(arr)

print(result)

Output

[1 1 2 2 2 0 3 3 3 3]

A, perhaps cleaner, alternative is to use collections.Counter, to count and create the mapping dictionary:

# get unique elements and counts
counts = Counter(arr)

# create a lookup dictionary value -> i where values are sorted according to frequency
mapping = {value: i for i, value in enumerate(sorted(counts, key=counts.get))}

Upvotes: 1

Related Questions