Jzl5325
Jzl5325

Reputation: 3974

NumPy PolyFit and PolyVal in Multiple Dimensions?

Assume an n-dimensional array of observations that are reshaped to be a 2d-array with each row being one observation set. Using this reshape approach, np.polyfit can compute 2nd order fit coefficients for the entire ndarray (vectorized):

fit = np.polynomial.polynomialpolyfit(X, Y, 2)

where Y is shape (304000, 21) and X is a vector. This results in a (304000,3) array of coefficients, fit.

Using an iterator it is possible to call np.polyval(fit, X) for each row. This is inefficient when a vectorized approach may exist. Could the fit result be applied to the entire observation array without iterating? If so, how?

This is along the lines of this SO question.

Upvotes: 8

Views: 15639

Answers (2)

Ryan M
Ryan M

Reputation: 751

np.polynomial.polynomial.polyval is a perfectly fine (and convenient) approach to efficient evaluation of polynomial fittings.

However, if 'speediest' is what you are looking for, simply constructing the polynomial inputs and using the rudimentary numpy matrix multiplication functions results in slightly faster ( roughly 4x faster) computational speeds.

Setup

Using the same setup as above, we'll create 25 different line fittings.

>>> num_samples = 100000
>>> num_lines = 100
>>> x = np.random.randint(0,100,num_samples)
>>> y = np.random.randint(0,100,(num_samples, num_lines))
>>> fit = np.polyfit(x,y,deg=2)
>>> xx = np.random.randint(0,100,num_samples*10)

Numpy's polyval Function

res1 = np.polynomial.polynomial.polyval(xx, fit)

Basic Matrix Multiplication

inputs = np.array([np.power(xx,d) for d in range(len(fit))])
res2 = fit.T.dot(inputs)

Timing the functions

Using the same parameters above...

%timeit _ = np.polynomial.polynomial.polyval(xx, fit)
1 loop, best of 3: 247 ms per loop

%timeit inputs = np.array([np.power(xx, d) for d in range(len(fit))]);_ = fit.T.dot(inputs)
10 loops, best of 3: 72.8 ms per loop

To beat a dead horse...

enter image description here

mean Efficiency bump of ~3.61x faster. Speed fluctuations probably come from random computer processes in background.

Upvotes: 0

Jaime
Jaime

Reputation: 67457

np.polynomial.polynomial.polyval takes multidimensional coefficient arrays:

>>> x = np.random.rand(100)
>>> y = np.random.rand(100, 25)
>>> fit = np.polynomial.polynomial.polyfit(x, y, 2)
>>> fit.shape # 25 columns of 3 polynomial coefficients
(3L, 25L)
>>> xx = np.random.rand(50)
>>> interpol = np.polynomial.polynomial.polyval(xx, fit)
>>> interpol.shape # 25 rows, each with 50 evaluations of the polynomial
(25L, 50L)

And of course:

>>> np.all([np.allclose(np.polynomial.polynomial.polyval(xx, fit[:, j]),
...                     interpol[j]) for j in range(25)])
True

Upvotes: 8

Related Questions