Randomize numpy.argsort output in case of ties

Question

I have a numpy array with some elements same as others i.e. there are ties, and I am applying np.argsort to find the indices which will sort the array:

In [29]: x = [1, 2, 1, 1, 5, 2]

In [30]: np.argsort(x)
Out[30]: array([0, 2, 3, 1, 5, 4])

In [31]: np.argsort(x)
Out[31]: array([0, 2, 3, 1, 5, 4])

As can be seen here, the outputs we get by running argsort two times are identical. However, array([2, 3, 0, 5, 1, 4]) is also a completely valid output because some elements in the original array are equal. Can I make argsort return me such "randomized" outputs when there are ties in my array? If not, what is a workaround because I don't want to bias my choice of the lowest values in the array when I am picking them.

Divakar · Accepted Answer

One trick would be to add uniform noise in [0,1) range and then perform argsort-ing. Adding such a noise forces sorting only within their respective bins and gives randomized sort indices restricted to those bins -

(x+np.random.rand(len(x))).argsort()

Randomize numpy.argsort output in case of ties

Answers (1)

Related Questions