Reputation: 103
I make iterative recursive weighted least square regression. There are two major parts : find the weights and fit the regression.
The fit consists of statsmodels.regression.linear_model.WLS.fit (2 matrix multiplication, 1 matrix inversion and 3 other matrix mult) and takes around 3 ms.
Fiding weights consists of substracting two arrays, divide each element by a scalar, square each element, find the opposite, add 1, find max between each and 0 (epanechnikov kernel on the standardised errors of the fit)
err = y - y_hat
h = np.std(err) * c
w=np.maximum(0,1-(err/h)**2)
but it takes 30ms. I don't understand why matrix inversion would take 10 times less time. We are talking about 3000x3000 matrices and 3000x1 arrays (y, y_hat, err and w), c is scalar and depends on the size (a function of 3000 in this example). The most consuming is the third line (>80% of calc time).
Now this doesn't seem like a lot, but I have to do a whole lot of these.
How can I accelerate this process?
Upvotes: 0
Views: 207
Reputation: 821
A small point, but you have **2; what about (err/h)*(err/h)?
I thought use of ** was more costly. In Spyder, if I define two functions:
def function_a(i):
a=i**2
return(a)
def function_b(i):
b=i*i
return(b)
And run from the console, I get these results; multiplication is 3x faster.
%timeit for x in range(100): function_a(x)
30.2 µs ± 1.12 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit for x in range(100): function_b(x)
11.5 µs ± 185 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Upvotes: 1