Reputation: 5408
I do try to find an appropriate function for the permeability of cells under varying conditions. If I assume constant permeability, I can fit it to the experimental data and use Sklearns PolynomialFeatures
together with a LinearModel
(As explained in this post) in order to determine a correlation between the conditions and the permeability. However, the permeability is not constant and now I try to fit my model with the permeability as a function of the process conditions. The PolynomialFeature
module of sklearn is quite nice to use.
Is there an equivalent function within scipy or numpy which allows me to create a polynomial model (including interaction terms e.g. a*x[0]*x[1]
etc.) of varying order without writing the whole function by hand ?
The standard polynomial class in numpy seems not to support interaction terms.
Upvotes: 2
Views: 409
Reputation: 4039
I'm not aware of such a function that does exactly what you need, but you can achieve it using a combination of itertools
and numpy
.
If you have n_features
predictor variables, you essentially must generate all vectors of length n_features
whose entries are non-negative integers and sum to the specified order. Each new feature column is the component-wise power using these vectors who sum to a given order.
For example, if order = 3
and n_features = 2
, one of the new features will be the old features raise to the respective powers, [2,1]
. I've written some code below for arbitrary order and number of features. I've modified the generation of vectors who sum to order
from this post.
import itertools
import numpy as np
from scipy.special import binom
def polynomial_features_with_cross_terms(X, order):
"""
X: numpy ndarray
Matrix of shape, `(n_samples, n_features)`, to be transformed.
order: integer, default 2
Order of polynomial features to be computed.
returns: T, powers.
`T` is a matrix of shape, `(n_samples, n_poly_features)`.
Note that `n_poly_features` is equal to:
`n_features+order-1` Choose `n_features-1`
See: https://en.wikipedia.org\
/wiki/Stars_and_bars_%28combinatorics%29#Theorem_two
`powers` is a matrix of shape, `(n_features, n_poly_features)`.
Each column specifies the power by row of the respective feature,
in the respective column of `T`.
"""
n_samples, n_features = X.shape
n_poly_features = int(binom(n_features+order-1, n_features-1))
powers = np.zeros((n_features, n_poly_features))
T = np.zeros((n_samples, n_poly_features), dtype=X.dtype)
combos = itertools.combinations(range(n_features+order-1), n_features-1)
for i,c in enumerate(combos):
powers[:,i] = np.array([
b-a-1 for a,b in zip((-1,)+c, c+(n_features+order-1,))
])
T[:,i] = np.prod(np.power(X, powers[:,i]), axis=1)
return T, powers
Here's some example usage:
>>> X = np.arange(-5,5).reshape(5,2)
>>> T,p = polynomial_features_with_cross_terms(X, order=3)
>>> print X
[[-5 -4]
[-3 -2]
[-1 0]
[ 1 2]
[ 3 4]]
>>> print p
[[ 0. 1. 2. 3.]
[ 3. 2. 1. 0.]]
>>> print T
[[ -64 -80 -100 -125]
[ -8 -12 -18 -27]
[ 0 0 0 -1]
[ 8 4 2 1]
[ 64 48 36 27]]
Finally, I should mention that the SVM polynomial kernel achieves exactly this effect without explicitly computing the polynomial map. There are of course pro's and con's to this, but I figured I should mentioned it for you to consider if you have not, yet.
Upvotes: 1