Reputation: 971
I'm writing a small script that calculates and plots a rarefaction curve, on a given data. (As described in Wikipedia: http://en.wikipedia.org/wiki/Rarefaction_%28ecology%29) But I try to plot the function for values larger than 170 i keep getting the following error: OverflowError: (34, 'Result too large')
Here is a sample of code with some data:
import numpy as np
import math
import matplotlib.pyplot as plt
import decimal
def pltCurve():
data = [[367, 172, 503, 1404, 8, 83, 7, 2, 7, 1, 0, 6, 31, 0, 6, 40, 0, 18, 132, 41, 1, 2, 15, 1, 0, 10, 0, 63, 59, 3, 0, 7, 9, 9, 4, 0, 2, 0, 23, 20, 4, 0, 0, 1, 11, 55, 0, 0, 1, 1, 0, 1, 4, 11, 0, 10, 6, 0, 4, 0, 443, 2, 49, 29, 0, 5, 6, 0, 0, 1, 0, 0, 0, 0, 0, 32, 0, 1, 14, 1, 0, 1, 3, 1, 1, 0, 7, 0, 2, 32, 2, 1, 55, 0, 21, 1, 7, 2, 0, 0, 0, 0, 0, 0, 0, 1, 76, 5, 9, 28, 1, 0, 72, 0, 0, 0, 0, 61, 6, 5, 0, 5, 2, 0, 1, 9, 1, 0, 1, 1, 1, 1, 1, 1, 34, 28, 1, 1, 1, 3, 3, 0, 0, 1, 0, 0, 3, 1, 3, 55, 19, 18, 87, 0, 1, 2, 6, 15, 10, 1, 2]]
for d in range(len(data)):
x = np.arange(1,170 , 10)
y = computeFn(d,x)
#plt.plot(x,y)
plt.errorbar(x,y,yerr=0.95)
plt.show()
def computeFn(i, n):
N = 4467
res = []
r = Decimal(0)
numOfGroups = 161
data = [[367, 172, 503, 1404, 8, 83, 7, 2, 7, 1, 0, 6, 31, 0, 6, 40, 0, 18, 132, 41, 1, 2, 15, 1, 0, 10, 0, 63, 59, 3, 0, 7, 9, 9, 4, 0, 2, 0, 23, 20, 4, 0, 0, 1, 11, 55, 0, 0, 1, 1, 0, 1, 4, 11, 0, 10, 6, 0, 4, 0, 443, 2, 49, 29, 0, 5, 6, 0, 0, 1, 0, 0, 0, 0, 0, 32, 0, 1, 14, 1, 0, 1, 3, 1, 1, 0, 7, 0, 2, 32, 2, 1, 55, 0, 21, 1, 7, 2, 0, 0, 0, 0, 0, 0, 0, 1, 76, 5, 9, 28, 1, 0, 72, 0, 0, 0, 0, 61, 6, 5, 0, 5, 2, 0, 1, 9, 1, 0, 1, 1, 1, 1, 1, 1, 34, 28, 1, 1, 1, 3, 3, 0, 0, 1, 0, 0, 3, 1, 3, 55, 19, 18, 87, 0, 1, 2, 6, 15, 10, 1, 2]]
#print N
for k in n:
r = (sum((logchoose(N-N_i,k)) for N_i in data[i]))*(logchoose(N,k))**-1
r = Decimal(numOfGroups) - r
print r # Debug
res.append(r)
return res
def logchoose(ni, ki):
"""
:rtype : N choose K Function
"""
try:
lgn1 = sum(math.log10(ii) for ii in range(1,ni))
lgk1 = sum(math.log10(ii) for ii in range(1,ki))
lgnk1 = sum(math.log10(ii) for ii in range(1,ni-ki+1))
except ValueError:
#print ni,ki
raise ValueError
#print 10**(lgn1 - (lgnk1 + lgk1))
return Decimal((10**(lgn1 - (lgnk1 + lgk1))))
pltCurve()
I've seen solutions to this problem using 'Decimal' module. I've played with it and still the error was raised. Any suggestions? Regards.
Edit: Here is the exact traceback:
Traceback (most recent call last):
File "C:\Users\user\Documents\Rarefactor\test.py", line 48, in <module>
pltCurve()
File "C:\Users\user\Documents\Rarefactor\test.py", line 11, in pltCurve
y = computeFn(d,x)
File "C:\Users\user\Documents\Rarefactor\test.py", line 26, in computeFn
r = (sum((logchoose(N-N_i,k)) for N_i in data[i]))*(logchoose(N,k))**-1
File "C:\Users\user\Documents\Rarefactor\test.py", line 26, in <genexpr>
r = (sum((logchoose(N-N_i,k)) for N_i in data[i]))*(logchoose(N,k))**-1
File "C:\Users\user\Documents\Rarefactor\test.py", line 45, in logchoose
return (10**(lgn1 - (lgnk1 + lgk1)))
OverflowError: (34, 'Result too large')
Upvotes: 2
Views: 12487
Reputation: 365925
Your exception comes from this line:
return (10**(lgn1 - (lgnk1 + lgk1)))
You tried to fix it by using Decimal
like this:
return Decimal(10**(lgn1 - (lgnk1 + lgk1)))
But that won't help. Because lgn1
, lgnk1
, and lgk1
are float
values, you're trying to do the arithmetic with float
values, and then convert the result to a Decimal
after it's done. Because the float
arithmetic overflows, it never gets to the conversion.
What you need to do is make the arithmetic happen on Decimal
values in the first place. For example:
lgn1 = Decimal(sum(math.log10(ii) for ii in range(1,ni)))
lgk1 = Decimal(sum(math.log10(ii) for ii in range(1,ki)))
lgnk1 = Decimal(sum(math.log10(ii) for ii in range(1,ni-ki+1)))
Now, when you do this:
return (10**(lgn1 - (lgnk1 + lgk1)))
… you've got Decimal
arithmetic, not float
, and it won't overflow (as long as your Decimal
context is large enough for these numbers, of course).
But you probably want to push the Decimal
as high up the chain as possible, not as low as possible. In this case, that's only one level up—calling math.log10
on an integer gives you a float
, but calling the log10
method on a Decimal
gives you a Decimal
, so:
lgn1 = sum(Decimal(ii).log10() for ii in range(1, ni))
Meanwhile, for future reference:
I dont see how i can minimize this code further aside from erasing the lines of plotting.
Well, first, why not erase the lines of plotting then?
But, more importantly, you know that the exception happens on the last line of the logchoose
function, and you know (or could know, by, say, adding a print ni, ki
or running in the debugger) what arguments cause it to raise. So you could reduce the whole thing to the logchoose
definition plus print logchoose(273, 114)
(or whatever the arguments are).
Besides being a lot shorter, this would also completely take numpy
and matplotlib
out of the equation, so people who know nothing about those libraries but know a lot about Python (which is the vast majority, and includes people who are smarter than me, dbliss, and Nimrodshn, or at least smarter than me) could solve your problem.
Upvotes: 2