Reputation: 227
I know that argmax()
returns the indices of the maximum values along an axis.
I also know that in the case of multiple occurrences of the maximum values, the index corresponding to the first occurrence is returned.
argmax()
works perfectly when you want to find the maximum value and its index. How would a numpy.argmode() function be written?
In other words how would a function that calculates the mode value in a numpy array and gets the index of the first occurrence be written?
Just so everyone knows there is no numpy.argmode but the functionality of such a function is what I seek.
I understand that the mode would have multiple occurrences. We should be able to get it to behave like argmax where if we have multiple occurrences, it simply returns the value and index of the first occurrence.
An example of what I would want is:
a = numpy.array([ 6, 3, 4, 1, 2, 2, 2])
numberIWant = numpy.argmode(a)
print(numberIWant)
# should print 4 (the index of the first occurrence of the mode)
I tried using:
stats.mode(a)[0][0]
numpy.argwhere(a==num)[0][0]
This did work but I'm looking for a more efficient and concise solution. Any ideas?
Upvotes: 3
Views: 411
Reputation: 67507
If you want to stay within NumPy, you can use some of the extra returns of np.unique
to get what you want:
>>> _, idx, cnt = np.unique(a, return_index=True, return_counts=True)
>>> idx[np.argmax(cnt)]
4
EDIT
To provide some context on what is going on... np.unique
always returns a sorted array of unique values. The optional return_index
provides another output array, with the index in which the first occurrence of each unique value happens. And the optional return_counts
provides an extra output with the number of occurrences of each unique value. With those building blocks, all you need to do is return the item of the index array at the position where the highest count happens.
Upvotes: 3
Reputation: 231738
What is it that makes one solution more 'elegant' than another? Shortness? Speed? Cleverness? Most Pythonic? Numpy-onic?
To me speed is more important that compactness. I can always make a solution more compact by wrapping it in a function call. Actually robustness is even more important.
A non-numpy route is to use handy tools in collections
, as sketched here:
In [342]: a = numpy.array([ 6, 3, 4, 1, 2, 2, 2])
In [343]: import collections
Use Counter
to quickly get the mode (value):
In [344]: c=collections.Counter(a)
In [345]: c
Out[345]: Counter({2: 3, 1: 1, 3: 1, 4: 1, 6: 1})
In [347]: mode=c.most_common(1)[0][0]
In [348]: mode
Out[348]: 2
Use defaultdict
to collect the locations all values:
In [349]: adict=collections.defaultdict(list)
In [350]: for i,v in enumerate(a):
adict[v].append(i)
In [351]: adict[mode]
Out[351]: [4, 5, 6]
I could have searched adict
for the longest list, but I suspect Counter
is faster.
Actually, when I know the mode
, all I need is where
- just as your use of stats
shows:
In [352]: np.where(a==mode)
Out[352]: (array([4, 5, 6], dtype=int32),)
In time tests on this small array, Counter
wins.
In [358]: timeit stats.mode(a)[0][0]
1000 loops, best of 3: 337 µs per loop
In [359]: timeit collections.Counter(a).most_common(1)[0][0]
10000 loops, best of 3: 20 µs per loop
Another possible tool is bincount
:
In [367]: np.bincount(a)
Out[367]: array([0, 1, 3, 1, 1, 0, 1], dtype=int32)
In [368]: timeit np.argmax(np.bincount(a))
100000 loops, best of 3: 3.29 µs per loop
and with the where
:
In [373]: timeit np.where(a==np.argmax(np.bincount(a)))[0][0]
100000 loops, best of 3: 11.2 µs per loop
It's fast, but I'm not sure if it is general enough.
Upvotes: 2