Peter
Peter

Reputation: 377

Fitting piecewise function: running into NaN problems due to boolean array indexing

So I've been trying to fit to an exponentially modified gaussian function (if interested, https://en.wikipedia.org/wiki/Exponentially_modified_Gaussian_distribution)

import numpy as np
import scipy.optimize as sio
import scipy.special as sps

def exp_gaussian(x, h, u, c, t):
    z = 1.0/sqrt(2.0) * (c/t - (x-u)/c)   #not important
    k1 = k2 = h * c / t * sqrt(pi / 2)    #not important
    n1 = 1/2 * (c / t)**2 - (x-u)/t       #not important
    n2 = -1 / 2 * ((x - u) / c)**2        #not important
    y = np.zeros(len(x))
    y += (k1 * np.exp(n1) * sps.erfc(z)) * (z < 0)
    y += (k2 * np.exp(n2) * sps.erfcx(z)) * (z >= 0)
    return y

In order to prevent overflow problems, one of two equivilent functions must be used depending on whether z is positive or negative (see Alternative forms for computation from previous wikipedia page).

The problem I am having is this: The line y += (k2 * np.exp(n2) * sps.erfcx(z)) * (z >= 0)is only supposed to add to y when z is positive. But if z is, say, -30, sps.erfcx(-30) is inf, and inf * False is NaN. Therefore, instead of leaving y untouched, the resulting y is clustered with NaN. Example:

x = np.linspace(400, 1000, 1001)
y = exp_gaussian(x, 100, 400, 10, 5)
y
array([ 84.27384586,  86.04516723,  87.57518493, ...,          nan,
                nan,          nan])

I tried the replacing the line in question with the following:

y += numpy.nan_to_num((k2 * np.exp(n2) * sps.erfcx(z)) * (z >= 0))

But doing this ran into serious runtime issues. Is there a way to only evaluate (k2 * np.exp(n2) * sps.erfcx(z)) on the condition that (z >= 0) ? Is there some other way to solve this without sacrificing runtime?

Thanks!

EDIT: After Rishi's advice, the following code seems to work much better:

def exp_gaussian(x, h, u, c, t):
    z = 1.0/sqrt(2.0) * (c/t - (x-u)/c)
    k1 = k2 = h * c / t * sqrt(pi / 2)
    n1 = 1/2 * (c / t)**2 - (x-u)/t
    n2 = -1 / 2 * ((x - u) / c)**2
    return = np.where(z >= 0, k2 * np.exp(n2) * sps.erfcx(z), k1 * np.exp(n1) * sps.erfc(z))

Upvotes: 0

Views: 182

Answers (2)

Rishi
Rishi

Reputation: 131

How about using numpy.where with something like: np.where(z >= 0, sps.erfcx(z), sps.erfc(z)). I'm no numpy expert, so don't know if it's efficient. Looks elegant at least!

Upvotes: 3

Aguy
Aguy

Reputation: 8059

One thing you could do is create a mask and reuse it so it wouldn't need to be evaluated twice. Another idea is to use the nan_to_num only once at the end

mask = (z<0)
y += (k1 * np.exp(n1) * sps.erfc(z)) * (mask)
y += (k2 * np.exp(n2) * sps.erfcx(z)) * (~mask)
y = numpy.nan_yo_num(y)

Try and see if this helps...

Upvotes: 0

Related Questions