Globoquadrina
Globoquadrina

Reputation: 17

Create polynomial feature matrix

I am trying to build a polynomial feature matrix similar to python's sklearn PolynomialFeatures in R. Unfortunately I could not find any existing packages with a similar function. I don't understand the underlying statistics of such a feature matrix - any help or pointers are very much appreciated!

The sklearn docs explain it as: Generate a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree. For example, if an input sample is two dimensional and of the form [a, b], the degree-2 polynomial features are [1, a, b, a^2, ab, b^2].

The python code I try to replicate is the following:

x1 = 298 
x2 = 35
x3 = 0.05
x4 = 0.01
​
X = np.vstack([x1, np.log(x1), x2, x3, x4]).T
​
poly = PolynomialFeatures(degree=3)
X_ = poly.fit_transform(X)
​
X_

[24]:
array([[1.00000000e+00, 2.98000000e+02, 5.69709349e+00, 3.50000000e+01,
        5.00000000e-02, 1.00000000e-02, 8.88040000e+04, 1.69773386e+03,
        1.04300000e+04, 1.49000000e+01, 2.98000000e+00, 3.24568742e+01,
        1.99398272e+02, 2.84854674e-01, 5.69709349e-02, 1.22500000e+03,
        1.75000000e+00, 3.50000000e-01, 2.50000000e-03, 5.00000000e-04,
        1.00000000e-04, 2.64635920e+07, 5.05924690e+05, 3.10814000e+06,
        4.44020000e+03, 8.88040000e+02, 9.67214851e+03, 5.94206851e+04,
        8.48866929e+01, 1.69773386e+01, 3.65050000e+05, 5.21500000e+02,
        1.04300000e+02, 7.45000000e-01, 1.49000000e-01, 2.98000000e-02,
        1.84909847e+02, 1.13599060e+03, 1.62284371e+00, 3.24568742e-01,
        6.97893952e+03, 9.96991360e+00, 1.99398272e+00, 1.42427337e-02,
        2.84854674e-03, 5.69709349e-04, 4.28750000e+04, 6.12500000e+01,
        1.22500000e+01, 8.75000000e-02, 1.75000000e-02, 3.50000000e-03,
        1.25000000e-04, 2.50000000e-05, 5.00000000e-06, 1.00000000e-06]])

Upvotes: 0

Views: 799

Answers (1)

Onyambu
Onyambu

Reputation: 79238

Use poly eg

c(1, poly(t(X), degree = 3, raw = TRUE))

Note that the ordering will be different.

Also Note that the python code is incorrect. If X is a column, then do not transpose. in that case you will have the correct values from each language:

poly.fit_transform(X.T) # Original X before transpose   
array([[1.00000000e+00, 2.98000000e+02, 8.88040000e+04, 2.64635920e+07],
       [1.00000000e+00, 5.69709349e+00, 3.24568742e+01, 1.84909847e+02],
       [1.00000000e+00, 3.50000000e+01, 1.22500000e+03, 4.28750000e+04],
       [1.00000000e+00, 5.00000000e-02, 2.50000000e-03, 1.25000000e-04],
       [1.00000000e+00, 1.00000000e-02, 1.00000000e-04, 1.00000000e-06]])

in R:

 X <- c(x1, log(x1), x2, x3, x4)
 cbind(intercept = 1, poly(X, 3, raw = TRUE))
     intercept          1           2            3
[1,]         1 298.000000 88804.00000 2.646359e+07
[2,]         1   5.697093    32.45687 1.849098e+02
[3,]         1  35.000000  1225.00000 4.287500e+04
[4,]         1   0.050000     0.00250 1.250000e-04
[5,]         1   0.010000     0.00010 1.000000e-06

Upvotes: 1

Related Questions