Tyson Graham
Tyson Graham

Reputation: 227

How would I write NumPy argmode()?

I know that argmax() returns the indices of the maximum values along an axis.

I also know that in the case of multiple occurrences of the maximum values, the index corresponding to the first occurrence is returned.

argmax() works perfectly when you want to find the maximum value and its index. How would a numpy.argmode() function be written?

In other words how would a function that calculates the mode value in a numpy array and gets the index of the first occurrence be written?

Just so everyone knows there is no numpy.argmode but the functionality of such a function is what I seek.

I understand that the mode would have multiple occurrences. We should be able to get it to behave like argmax where if we have multiple occurrences, it simply returns the value and index of the first occurrence.

An example of what I would want is:

a = numpy.array([ 6, 3, 4, 1, 2, 2, 2])
numberIWant = numpy.argmode(a)
print(numberIWant)
# should print 4 (the index of the first occurrence of the mode)

I tried using:

stats.mode(a)[0][0]
numpy.argwhere(a==num)[0][0]

This did work but I'm looking for a more efficient and concise solution. Any ideas?

Upvotes: 3

Views: 411

Answers (2)

Jaime
Jaime

Reputation: 67507

If you want to stay within NumPy, you can use some of the extra returns of np.unique to get what you want:

>>> _, idx, cnt = np.unique(a, return_index=True, return_counts=True)
>>> idx[np.argmax(cnt)]
4

EDIT

To provide some context on what is going on... np.unique always returns a sorted array of unique values. The optional return_index provides another output array, with the index in which the first occurrence of each unique value happens. And the optional return_counts provides an extra output with the number of occurrences of each unique value. With those building blocks, all you need to do is return the item of the index array at the position where the highest count happens.

Upvotes: 3

hpaulj
hpaulj

Reputation: 231738

What is it that makes one solution more 'elegant' than another? Shortness? Speed? Cleverness? Most Pythonic? Numpy-onic?

To me speed is more important that compactness. I can always make a solution more compact by wrapping it in a function call. Actually robustness is even more important.


A non-numpy route is to use handy tools in collections, as sketched here:

In [342]: a = numpy.array([ 6, 3, 4, 1, 2, 2, 2])

In [343]: import collections

Use Counter to quickly get the mode (value):

In [344]: c=collections.Counter(a)
In [345]: c
Out[345]: Counter({2: 3, 1: 1, 3: 1, 4: 1, 6: 1})
In [347]: mode=c.most_common(1)[0][0]
In [348]: mode
Out[348]: 2

Use defaultdict to collect the locations all values:

In [349]: adict=collections.defaultdict(list)
In [350]: for i,v in enumerate(a):
    adict[v].append(i)
In [351]: adict[mode]
Out[351]: [4, 5, 6]

I could have searched adict for the longest list, but I suspect Counter is faster.

Actually, when I know the mode, all I need is where - just as your use of stats shows:

In [352]: np.where(a==mode)
Out[352]: (array([4, 5, 6], dtype=int32),)

In time tests on this small array, Counter wins.

In [358]: timeit stats.mode(a)[0][0]
1000 loops, best of 3: 337 µs per loop
In [359]: timeit collections.Counter(a).most_common(1)[0][0]
10000 loops, best of 3: 20 µs per loop

Another possible tool is bincount:

In [367]: np.bincount(a)
Out[367]: array([0, 1, 3, 1, 1, 0, 1], dtype=int32)
In [368]: timeit np.argmax(np.bincount(a))
100000 loops, best of 3: 3.29 µs per loop

and with the where:

In [373]: timeit np.where(a==np.argmax(np.bincount(a)))[0][0]
100000 loops, best of 3: 11.2 µs per loop

It's fast, but I'm not sure if it is general enough.

Upvotes: 2

Related Questions