Python 2.7 memory leak with scipy.minimze

Question

During a fit procedure, my RAM memory slowly but steadily (about 2.8 mb every couple of seconds) increases until I get a memory error or I terminate the program. This happens when I try to fit some 80 measurements by fitting a model to them. This fitting is done by using scipy.minimze to minimize Chi_squared.

So far I've tried:

Playing with the Garbage collector to collect every time Chi_squared calls my model, didn't help.
Looking at all variables using global() and then using pympler.asizeof to find the total amount of space my variables take up, this first increases but then stays constant.
- The pympler.tracker.SummaryTracker also didn't show any increase of variable size.
I've also looked at the memory_profiler but didn't find anything relevant.

From these test, it seems that my RAM usage goes up while the total space my variables take up is constant. Where my memory goes I would really like to know.

The code below reproduces the problem for me:

import numpy as np
import scipy
import scipy.optimize as op
import scipy.stats
import scipy.integrate



def fit_model(model_pmt, x_list, y_list, PMT_parra, PMT_bounds=None, tolerance=10**-1, PMT_start_gues=None):
    result = op.minimize(chi_squared, PMT_start_gues, args=(x_list, y_list, model_pmt, PMT_parra[0], PMT_parra[1], PMT_parra[2]),
                     bounds=PMT_bounds, method='SLSQP', options={"ftol": tolerance})
    print result



def chi_squared(fit_parm, x, y_val, model, *non_fit_parm):
    parm = np.concatenate((fit_parm, non_fit_parm))
    y_mod = model(x, *parm)
    X2 = sum(pow(y_val - y_mod, 2))
    return X2



def basic_model(cb_list, max_intesity, sigma_e, noise, N, centre1, centre2, sigma_eb, min_dist=10**-5):
        """
        plateau function consisting of two gaussian CDF functions.
        """
        def get_distance(x, r):
            dist = abs(x - r)
            if dist < min_dist:
                dist = min_dist
            return dist

        def amount_of_material(x):
            A = scipy.stats.norm.cdf((x - centre1) / sigma_e)
            B = (1 - scipy.stats.norm.cdf((x - centre2) / sigma_e))
            cube =  A * B
            return cube

        def amount_of_field_INTEGRAL(x, cb):
        """Integral that is part of my sum"""
            result = scipy.integrate.quad(lambda r: scipy.stats.norm.pdf((r - cb) / sigma_b) / pow(get_distance(x, r), N),
                                          start, end, epsabs=10 ** -1)[0]
            return result



        # Set some constants, not important
        sigma_b = (sigma_eb**2-sigma_e**2)**0.5
        start, end = centre1 - 3 * sigma_e, centre2 + 3 * sigma_e
        integration_range = np.linspace(start, end, int(end - start) / 20)
        intensity_list = []

        # Doing a riemann sum, this is what takes the most time.
        for i, cb_point in enumerate(cb_list):
            intensity = sum([amount_of_material(x) * amount_of_field_INTEGRAL(x, cb_point) for x in integration_range])
            intensity *= (integration_range[1] - integration_range[0])
            intensity_list.append(intensity)


        model_values = np.array(intensity_list) / max(intensity_list)* max_intesity + noise
        return model_values


def get_dummy_data():
"""Can be ignored, produces something resembling my data with noise"""
    # X is just a range
    x_list = np.linspace(0, 300, 300)

    # Y is some sort of step function with noise
    A = scipy.stats.norm.cdf((x_list - 100) / 15.8)
    B = (1 - scipy.stats.norm.cdf((x_list - 200) / 15.8))
    y_list = A * B * .8 + .1 + np.random.normal(0, 0.05, 300)

    return x_list, y_list


if __name__=="__main__":
    # Set some variables
    start_pmt = [0.7, 8, 0.15, 0.6]
    pmt_bounds = [(.5, 1.3), (4, 15), (0.05, 0.3), (0.5, 3)]
    pmt_par = [110, 160, 15]
    x_list, y_list = get_dummy_data()

    fit_model(basic_model, x_list, y_list,  pmt_par, PMT_start_gues=start_pmt, PMT_bounds=pmt_bounds, tolerance=0.1)

Thanks for trying to help!

MB-F · Accepted Answer

I narrowed down the problem by successively removing layer after layer of indirection. (@joris267 This is something you really should have done yourself before asking.) The minimal remaining code to reproduce the problem looks like this:

import scipy.integrate

if __name__=="__main__":    
    while True:
        scipy.integrate.quad(lambda r: 0, 1, 100)

Conclusion:

Yes, there is e memory leak.
No, the leak is not in scipy.minimize but in scipy.quad.

However, this is a known issue with scipy 0.19.0. Upgrade to 0.19.1 should supposedly fix the problem, but I don't know for sure because I'm still with 0.19.0 myself :)

Update:

After upgrading scipy to 0.19.1 (and numpy to 1.13.3 for compatibility) the leak disapeared on my system.

Python 2.7 memory leak with scipy.minimze

Answers (1)

Related Questions