Moritz
Moritz

Reputation: 5408

Scipy fitting polynomial model to some data

I do try to find an appropriate function for the permeability of cells under varying conditions. If I assume constant permeability, I can fit it to the experimental data and use Sklearns PolynomialFeatures together with a LinearModel (As explained in this post) in order to determine a correlation between the conditions and the permeability. However, the permeability is not constant and now I try to fit my model with the permeability as a function of the process conditions. The PolynomialFeature module of sklearn is quite nice to use.

Is there an equivalent function within scipy or numpy which allows me to create a polynomial model (including interaction terms e.g. a*x[0]*x[1] etc.) of varying order without writing the whole function by hand ?

The standard polynomial class in numpy seems not to support interaction terms.

Upvotes: 2

Views: 409

Answers (1)

Matt Hancock
Matt Hancock

Reputation: 4039

I'm not aware of such a function that does exactly what you need, but you can achieve it using a combination of itertools and numpy.

If you have n_features predictor variables, you essentially must generate all vectors of length n_features whose entries are non-negative integers and sum to the specified order. Each new feature column is the component-wise power using these vectors who sum to a given order.

For example, if order = 3 and n_features = 2, one of the new features will be the old features raise to the respective powers, [2,1]. I've written some code below for arbitrary order and number of features. I've modified the generation of vectors who sum to order from this post.

import itertools
import numpy as np
from scipy.special import binom 

def polynomial_features_with_cross_terms(X, order):
    """
    X: numpy ndarray
        Matrix of shape, `(n_samples, n_features)`, to be transformed.
    order: integer, default 2
        Order of polynomial features to be computed. 

    returns: T, powers.
        `T` is a matrix of shape,  `(n_samples, n_poly_features)`.
        Note that `n_poly_features` is equal to:

           `n_features+order-1` Choose `n_features-1`

        See: https://en.wikipedia.org\
        /wiki/Stars_and_bars_%28combinatorics%29#Theorem_two

        `powers` is a matrix of shape, `(n_features, n_poly_features)`.
        Each column specifies the power by row of the respective feature, 
        in the respective column of `T`.
    """
    n_samples, n_features = X.shape
    n_poly_features = int(binom(n_features+order-1, n_features-1))
    powers = np.zeros((n_features, n_poly_features))
    T = np.zeros((n_samples, n_poly_features), dtype=X.dtype)

    combos = itertools.combinations(range(n_features+order-1), n_features-1)
    for i,c in enumerate(combos):
        powers[:,i] = np.array([
            b-a-1 for a,b in zip((-1,)+c, c+(n_features+order-1,))
        ])

        T[:,i] = np.prod(np.power(X, powers[:,i]), axis=1)

    return T, powers

Here's some example usage:

>>> X = np.arange(-5,5).reshape(5,2)
>>> T,p = polynomial_features_with_cross_terms(X, order=3)
>>> print X
[[-5 -4]
 [-3 -2]
 [-1  0]
 [ 1  2]
 [ 3  4]]
>>> print p
[[ 0.  1.  2.  3.]
 [ 3.  2.  1.  0.]]
>>> print T
[[ -64  -80 -100 -125]
 [  -8  -12  -18  -27]
 [   0    0    0   -1]
 [   8    4    2    1]
 [  64   48   36   27]]

Finally, I should mention that the SVM polynomial kernel achieves exactly this effect without explicitly computing the polynomial map. There are of course pro's and con's to this, but I figured I should mentioned it for you to consider if you have not, yet.

Upvotes: 1

Related Questions