Logan Davis
Logan Davis

Reputation: 117

SciPy curve_fit with np.log returns immediately with popt = p0, pcov = inf

I'm trying to optimize a logarithmic fit to a data set with scipy.optimize.curve_fit. Before trying it on an actual data set, I wrote code to run on a dummy data set.

def do_fitting():
    x = np.linspace(0, 4, 100)
    y = func(x, 1.1, .4, 5)
    y2 = y + 0.2 * np.random.normal(size=len(x))

    popt, pcov = curve_fit(func, x, y2, p0=np.array([2, 0.5, 1]))

    plt.figure()
    plt.plot(x, y, 'bo', label="Clean Data")
    plt.plot(x, y2, 'ko', label="Fuzzed Data")
    plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
    plt.legend()
    plt.show()

Of course, do_fitting() relies on func(), which it passes to curve_fit. Here's the problem. When I pass a func() that contains np.log, i.e. the function that I actually want to fit to, curve_fit declares that p0 (the initial condition) is the optimal solution and returns immediately with an infinite covariance.

Here's what happens if I run do_fitting() with a non-logarithmic func():

def func(x, a, b, c):
    return a * np.exp(x*b) + c

exponential fit

popt = [ 0.90894173  0.44279212  5.19928151]
pcov = [[ 0.02044817 -0.00471525 -0.02601574]
        [-0.00471525  0.00109879  0.00592502]
        [-0.02601574  0.00592502  0.0339901 ]]

Here's what happens when I run do_fitting() with a logarithmic func():

def func(x, a, b, c):
    return a * np.log(x*b) + c

logarithmic fit

popt = [ 2.   0.5  1. ]
pcov = inf

You'll notice that the logarithmic solution for popt is equal to the value I gave curve_fit for p0 in the above do_fitting(). This is true, and pcov is infinite, for every value of p0 I have tried.

What am I doing wrong here?

Upvotes: 1

Views: 3604

Answers (2)

mvanvleet
mvanvleet

Reputation: 91

I was able to fit a logarithmic function just fine using the following code (hardly modified from your original):

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def func(x, a, b, c):
    return a * np.log(x+b) + c

def do_fitting():
    x = np.linspace(0, 4, 100)
    y = func(x, 1.1, .4, 5)
    y2 = y + 0.2 * np.random.normal(size=len(x))

    popt, pcov = curve_fit(func, x, y2, p0=np.array([2, 0.5, 1]))

    plt.figure()
    plt.plot(x, y, 'bo', label="Clean Data")
    plt.plot(x, y2, 'ko', label="Fuzzed Data")
    plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
    plt.legend()
    plt.show()

do_fitting()

(Unfortunately I can't post a picture of the final fit, but it agrees quite nicely with the clean data).

Likely your problem is not the logarithm itself, but some difficulty curve_fit is having with the specific function you're trying to fit. Can you edit your question to provide an example of the exact logarithmic function you're trying to fit?

EDIT: The function you provided is not well-defined for x=0, and produces a RuntimeWarning upon execution. curve_fit is not good at handling NaNs, and will not be able to fit the function in this case. If you change x to

x = np.linspace(1, 4, 100)

curve_fit performs just fine.

Upvotes: 2

ali_m
ali_m

Reputation: 74172

The problem is very simple - since the first value in your x array is 0, you are taking the log of 0, which is equal to -inf:

x = np.linspace(0, 4, 100)
p0 = np.array([2, 0.5, 1])

print(func(x, *p0).min())
# -inf

Upvotes: 3

Related Questions