Nimrodshn
Nimrodshn

Reputation: 971

OverflowError: (34, 'Result too large') in Python

I'm writing a small script that calculates and plots a rarefaction curve, on a given data. (As described in Wikipedia: http://en.wikipedia.org/wiki/Rarefaction_%28ecology%29) But I try to plot the function for values larger than 170 i keep getting the following error: OverflowError: (34, 'Result too large')

Here is a sample of code with some data:

import numpy as np
import math
import matplotlib.pyplot as plt
import decimal

def pltCurve():
    data = [[367, 172, 503, 1404, 8, 83, 7, 2, 7, 1, 0, 6, 31, 0, 6, 40, 0, 18, 132, 41, 1, 2, 15, 1, 0, 10, 0, 63, 59, 3, 0, 7, 9, 9, 4, 0, 2, 0, 23, 20, 4, 0, 0, 1, 11, 55, 0, 0, 1, 1, 0, 1, 4, 11, 0, 10, 6, 0, 4, 0, 443, 2, 49, 29, 0, 5, 6, 0, 0, 1, 0, 0, 0, 0, 0, 32, 0, 1, 14, 1, 0, 1, 3, 1, 1, 0, 7, 0, 2, 32, 2, 1, 55, 0, 21, 1, 7, 2, 0, 0, 0, 0, 0, 0, 0, 1, 76, 5, 9, 28, 1, 0, 72, 0, 0, 0, 0, 61, 6, 5, 0, 5, 2, 0, 1, 9, 1, 0, 1, 1, 1, 1, 1, 1, 34, 28, 1, 1, 1, 3, 3, 0, 0, 1, 0, 0, 3, 1, 3, 55, 19, 18, 87, 0, 1, 2, 6, 15, 10, 1, 2]]

    for d in range(len(data)):
        x = np.arange(1,170 , 10)
        y = computeFn(d,x)
        #plt.plot(x,y)
        plt.errorbar(x,y,yerr=0.95)

    plt.show()

def computeFn(i, n):
    N = 4467
    res = []
    r = Decimal(0)
    numOfGroups = 161
    data = [[367, 172, 503, 1404, 8, 83, 7, 2, 7, 1, 0, 6, 31, 0, 6, 40, 0, 18, 132, 41, 1, 2, 15, 1, 0, 10, 0, 63, 59, 3, 0, 7, 9, 9, 4, 0, 2, 0, 23, 20, 4, 0, 0, 1, 11, 55, 0, 0, 1, 1, 0, 1, 4, 11, 0, 10, 6, 0, 4, 0, 443, 2, 49, 29, 0, 5, 6, 0, 0, 1, 0, 0, 0, 0, 0, 32, 0, 1, 14, 1, 0, 1, 3, 1, 1, 0, 7, 0, 2, 32, 2, 1, 55, 0, 21, 1, 7, 2, 0, 0, 0, 0, 0, 0, 0, 1, 76, 5, 9, 28, 1, 0, 72, 0, 0, 0, 0, 61, 6, 5, 0, 5, 2, 0, 1, 9, 1, 0, 1, 1, 1, 1, 1, 1, 34, 28, 1, 1, 1, 3, 3, 0, 0, 1, 0, 0, 3, 1, 3, 55, 19, 18, 87, 0, 1, 2, 6, 15, 10, 1, 2]]
    #print N
    for k in n:
        r = (sum((logchoose(N-N_i,k)) for N_i in data[i]))*(logchoose(N,k))**-1
        r = Decimal(numOfGroups) - r
        print r # Debug
        res.append(r)
    return res

def logchoose(ni, ki):

    """
    :rtype : N choose K Function
    """
    try:
        lgn1 = sum(math.log10(ii) for ii in range(1,ni))
        lgk1 = sum(math.log10(ii) for ii in range(1,ki))
        lgnk1 = sum(math.log10(ii) for ii in range(1,ni-ki+1))
    except ValueError:
        #print ni,ki
        raise ValueError
    #print 10**(lgn1 - (lgnk1 + lgk1))
    return Decimal((10**(lgn1 - (lgnk1 + lgk1))))


pltCurve()

I've seen solutions to this problem using 'Decimal' module. I've played with it and still the error was raised. Any suggestions? Regards.

Edit: Here is the exact traceback:

    Traceback (most recent call last):
  File "C:\Users\user\Documents\Rarefactor\test.py", line 48, in <module>
    pltCurve()
  File "C:\Users\user\Documents\Rarefactor\test.py", line 11, in pltCurve
    y = computeFn(d,x)
  File "C:\Users\user\Documents\Rarefactor\test.py", line 26, in computeFn
    r = (sum((logchoose(N-N_i,k)) for N_i in data[i]))*(logchoose(N,k))**-1
  File "C:\Users\user\Documents\Rarefactor\test.py", line 26, in <genexpr>
    r = (sum((logchoose(N-N_i,k)) for N_i in data[i]))*(logchoose(N,k))**-1
  File "C:\Users\user\Documents\Rarefactor\test.py", line 45, in logchoose
    return (10**(lgn1 - (lgnk1 + lgk1)))
OverflowError: (34, 'Result too large')

Upvotes: 2

Views: 12487

Answers (1)

abarnert
abarnert

Reputation: 365925

Your exception comes from this line:

return (10**(lgn1 - (lgnk1 + lgk1)))

You tried to fix it by using Decimal like this:

return Decimal(10**(lgn1 - (lgnk1 + lgk1)))

But that won't help. Because lgn1, lgnk1, and lgk1 are float values, you're trying to do the arithmetic with float values, and then convert the result to a Decimal after it's done. Because the float arithmetic overflows, it never gets to the conversion.

What you need to do is make the arithmetic happen on Decimal values in the first place. For example:

lgn1 = Decimal(sum(math.log10(ii) for ii in range(1,ni)))
lgk1 = Decimal(sum(math.log10(ii) for ii in range(1,ki)))
lgnk1 = Decimal(sum(math.log10(ii) for ii in range(1,ni-ki+1)))

Now, when you do this:

return (10**(lgn1 - (lgnk1 + lgk1)))

… you've got Decimal arithmetic, not float, and it won't overflow (as long as your Decimal context is large enough for these numbers, of course).

But you probably want to push the Decimal as high up the chain as possible, not as low as possible. In this case, that's only one level up—calling math.log10 on an integer gives you a float, but calling the log10 method on a Decimal gives you a Decimal, so:

lgn1 = sum(Decimal(ii).log10() for ii in range(1, ni))

Meanwhile, for future reference:

I dont see how i can minimize this code further aside from erasing the lines of plotting.

Well, first, why not erase the lines of plotting then?

But, more importantly, you know that the exception happens on the last line of the logchoose function, and you know (or could know, by, say, adding a print ni, ki or running in the debugger) what arguments cause it to raise. So you could reduce the whole thing to the logchoose definition plus print logchoose(273, 114) (or whatever the arguments are).

Besides being a lot shorter, this would also completely take numpy and matplotlib out of the equation, so people who know nothing about those libraries but know a lot about Python (which is the vast majority, and includes people who are smarter than me, dbliss, and Nimrodshn, or at least smarter than me) could solve your problem.

Upvotes: 2

Related Questions