Captastic
Captastic

Reputation: 1066

Calculating the mode in a multimodal list in Python

I'm trying to calculate the mode (most frequent value) of a list of values in Python. I came up with a solution, which gave out the wrong answer anyway, but I then realised that my data may be mutlimodal;

ie 1,1,2,3,4,4 mode = 1 & 4

Here is what I came up with so far:

def mode(valueList):
  frequencies = {}
  for value in valueList:
    if value in frequencies:
      frequencies[value] += 1
    else:
      frequencies[value] = 1
  mode = max(frequencies.itervalues())
  return mode

I think the problem here is that I'm outputting the value rather than the pointer of the maximum value. Anyway can anyone suggest a better way of doing this that could work where there is more than one mode? Or failing that how I can fix what I've got so far and identify a single mode?

As you can probably tell I'm very new to python, thanks for the help.

edit: should have mentioned I'm in Python 2.4

Upvotes: 7

Views: 12520

Answers (4)

Xavier Guihot
Xavier Guihot

Reputation: 61646

Note that starting in Python 3.8, the standard library includes the statistics.multimode function to return a list of the most frequently occurring values in the order they were first encountered:

from statistics import multimode

multimode([1, 1, 2, 3, 4, 4])
# [1, 4]

Upvotes: 5

Fred Foo
Fred Foo

Reputation: 363487

In Python >=2.7, use collections.Counter for frequency tables.

from collections import Counter
from itertools import takewhile

data = [1,1,2,3,4,4]
freq = Counter(data)
mostfreq = freq.most_common()
modes = list(takewhile(lambda x_f: x_f[1] == mostfreq[0][1], mostfreq))

Note the use of an anonymous function (lambda) that checks whether a pair (_, f) has the same frequency as the most frequent element.

Upvotes: 5

senderle
senderle

Reputation: 150947

Well, the first problem is that yes, you're returning the value in frequences rather than the key. That means you get the count of the mode, not the mode itself. Normally, to get the mode, you'd use the key keyword argument to max, like so:

>>> max(frequencies, key=counts.get())

But in 2.4 that doesn't exist! Here's an approach that I believe will work in 2.4:

>>> import random
>>> l = [random.randrange(0, 5) for _ in range(50)]
>>> frequencies = {}
>>> for i in l:
...     frequencies[i] = frequencies.get(i, 0) + 1
... 
>>> frequencies
{0: 11, 1: 13, 2: 8, 3: 8, 4: 10}
>>> mode = max((v, k) for k, v in frequencies.iteritems())[1]
>>> mode
1
>>> max_freq = max(frequencies.itervalues())
>>> modes = [k for k, v in frequencies.iteritems() if v == max_freq]
>>> modes
[1]

I prefer the decorate-sort-undecorate idiom to the cmp keyword. I think it's more readable. Could be that's just me.

Upvotes: 4

Not_a_Golfer
Not_a_Golfer

Reputation: 49177

you can use counter for the top value while iterating, something like this:

def mode(valueList):
  frequencies = {}
  mx = None
  for value in valueList:
    if value in frequencies:
      frequencies[value] += 1
    else:
      frequencies[value] = 1
    if not mx or frequencies[value] > mx[1]:
      mx = (value, frequencies[value])

  mode = mx[0]
  return mode

another approach for multiple modes, using nlargest, which can give you the N largest values of a dictionary:

from heapq import nlargest
import operator

def mode(valueList, nmodes):
  frequencies = {}

  for value in valueList:
    frequencies[value] = frequencies.get(value, 0) + 1

  return [x[0] for x in nlargest(nmodes,frequencies.iteritems(),operator.itemgetter(1))]

Upvotes: 1

Related Questions