appleLover
appleLover

Reputation: 15691

Python fmin too slow

i have a 3x2000 numpy array in x_data and a 1x2000 numpy array in y_data which i pass to this function regress to give me a regression line. it works fine. the problem is that i am trying to do some backtesting and to test 1000 situations i have to regress 1000 times and it will take me about 5 minutes to run this.

i tried standardizing the variables it didn't seem to make it faster.

i also briefly tried fmin_powell and fmin_bfgs which seemed to break it.

any ideas? thanks!

def regress(x_data, y_data, fg_spread, fg_line):

    theta = np.matrix(np.ones((1,x_data.shape[0]))*.11)
    hyp = lambda theta, x: 1 / (1 + np.exp(-(theta*x)))
    cost_hyp = lambda theta, x, y: ((np.multiply(-y,np.log10(hyp(theta,x)))) - \
                            (np.multiply((1-y),(np.log10(1-hyp(theta, x)))))).sum()

    theta = scipy.optimize.fmin(cost_hyp, theta, args=(x_data,y_data), xtol=.00001, disp=0)

    return hyp(np.matrix(theta),np.matrix([1,fg_spread, fg_line]).reshape(3,1))

Upvotes: 4

Views: 1776

Answers (1)

Nicolas Barbey
Nicolas Barbey

Reputation: 6797

Use numexpr to make your hyp and cost_hyp computatation to evaluate faster. fmin family of functions compute those functions numerous times for different entries. So any gain to those functions are directly reported in the minimization.

So for instance you would replace:

hyp = lambda theta, x: 1 / (1 + np.exp(-(theta*x)))

by:

hyp = lambda theta, x: numexpr.evaluate("1 / (1 + exp(-(theta*x)))")

Numexpr is meant to work with numpy array.

Upvotes: 2

Related Questions