Reputation: 15691
i have a 3x2000 numpy array in x_data and a 1x2000 numpy array in y_data which i pass to this function regress to give me a regression line. it works fine. the problem is that i am trying to do some backtesting and to test 1000 situations i have to regress 1000 times and it will take me about 5 minutes to run this.
i tried standardizing the variables it didn't seem to make it faster.
i also briefly tried fmin_powell and fmin_bfgs which seemed to break it.
any ideas? thanks!
def regress(x_data, y_data, fg_spread, fg_line):
theta = np.matrix(np.ones((1,x_data.shape[0]))*.11)
hyp = lambda theta, x: 1 / (1 + np.exp(-(theta*x)))
cost_hyp = lambda theta, x, y: ((np.multiply(-y,np.log10(hyp(theta,x)))) - \
(np.multiply((1-y),(np.log10(1-hyp(theta, x)))))).sum()
theta = scipy.optimize.fmin(cost_hyp, theta, args=(x_data,y_data), xtol=.00001, disp=0)
return hyp(np.matrix(theta),np.matrix([1,fg_spread, fg_line]).reshape(3,1))
Upvotes: 4
Views: 1776
Reputation: 6797
Use numexpr to make your hyp and cost_hyp computatation to evaluate faster. fmin family of functions compute those functions numerous times for different entries. So any gain to those functions are directly reported in the minimization.
So for instance you would replace:
hyp = lambda theta, x: 1 / (1 + np.exp(-(theta*x)))
by:
hyp = lambda theta, x: numexpr.evaluate("1 / (1 + exp(-(theta*x)))")
Numexpr is meant to work with numpy array.
Upvotes: 2