Reputation: 1066
I'm trying to calculate the mode (most frequent value) of a list of values in Python. I came up with a solution, which gave out the wrong answer anyway, but I then realised that my data may be mutlimodal;
ie 1,1,2,3,4,4 mode = 1 & 4
Here is what I came up with so far:
def mode(valueList):
frequencies = {}
for value in valueList:
if value in frequencies:
frequencies[value] += 1
else:
frequencies[value] = 1
mode = max(frequencies.itervalues())
return mode
I think the problem here is that I'm outputting the value rather than the pointer of the maximum value. Anyway can anyone suggest a better way of doing this that could work where there is more than one mode? Or failing that how I can fix what I've got so far and identify a single mode?
As you can probably tell I'm very new to python, thanks for the help.
edit: should have mentioned I'm in Python 2.4
Upvotes: 7
Views: 12520
Reputation: 61646
Note that starting in Python 3.8
, the standard library includes the statistics.multimode
function to return a list of the most frequently occurring values in the order they were first encountered:
from statistics import multimode
multimode([1, 1, 2, 3, 4, 4])
# [1, 4]
Upvotes: 5
Reputation: 363487
In Python >=2.7, use collections.Counter
for frequency tables.
from collections import Counter
from itertools import takewhile
data = [1,1,2,3,4,4]
freq = Counter(data)
mostfreq = freq.most_common()
modes = list(takewhile(lambda x_f: x_f[1] == mostfreq[0][1], mostfreq))
Note the use of an anonymous function (lambda
) that checks whether a pair (_, f)
has the same frequency as the most frequent element.
Upvotes: 5
Reputation: 150947
Well, the first problem is that yes, you're returning the value in frequences
rather than the key. That means you get the count of the mode, not the mode itself. Normally, to get the mode, you'd use the key
keyword argument to max, like so:
>>> max(frequencies, key=counts.get())
But in 2.4 that doesn't exist! Here's an approach that I believe will work in 2.4:
>>> import random
>>> l = [random.randrange(0, 5) for _ in range(50)]
>>> frequencies = {}
>>> for i in l:
... frequencies[i] = frequencies.get(i, 0) + 1
...
>>> frequencies
{0: 11, 1: 13, 2: 8, 3: 8, 4: 10}
>>> mode = max((v, k) for k, v in frequencies.iteritems())[1]
>>> mode
1
>>> max_freq = max(frequencies.itervalues())
>>> modes = [k for k, v in frequencies.iteritems() if v == max_freq]
>>> modes
[1]
I prefer the decorate-sort-undecorate idiom to the cmp
keyword. I think it's more readable. Could be that's just me.
Upvotes: 4
Reputation: 49177
you can use counter for the top value while iterating, something like this:
def mode(valueList):
frequencies = {}
mx = None
for value in valueList:
if value in frequencies:
frequencies[value] += 1
else:
frequencies[value] = 1
if not mx or frequencies[value] > mx[1]:
mx = (value, frequencies[value])
mode = mx[0]
return mode
another approach for multiple modes, using nlargest, which can give you the N largest values of a dictionary:
from heapq import nlargest
import operator
def mode(valueList, nmodes):
frequencies = {}
for value in valueList:
frequencies[value] = frequencies.get(value, 0) + 1
return [x[0] for x in nlargest(nmodes,frequencies.iteritems(),operator.itemgetter(1))]
Upvotes: 1