Reputation: 17
I am trying to build a polynomial feature matrix similar to python's sklearn PolynomialFeatures in R. Unfortunately I could not find any existing packages with a similar function. I don't understand the underlying statistics of such a feature matrix - any help or pointers are very much appreciated!
The sklearn docs explain it as: Generate a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree. For example, if an input sample is two dimensional and of the form [a, b], the degree-2 polynomial features are [1, a, b, a^2, ab, b^2].
The python code I try to replicate is the following:
x1 = 298
x2 = 35
x3 = 0.05
x4 = 0.01
X = np.vstack([x1, np.log(x1), x2, x3, x4]).T
poly = PolynomialFeatures(degree=3)
X_ = poly.fit_transform(X)
X_
[24]:
array([[1.00000000e+00, 2.98000000e+02, 5.69709349e+00, 3.50000000e+01,
5.00000000e-02, 1.00000000e-02, 8.88040000e+04, 1.69773386e+03,
1.04300000e+04, 1.49000000e+01, 2.98000000e+00, 3.24568742e+01,
1.99398272e+02, 2.84854674e-01, 5.69709349e-02, 1.22500000e+03,
1.75000000e+00, 3.50000000e-01, 2.50000000e-03, 5.00000000e-04,
1.00000000e-04, 2.64635920e+07, 5.05924690e+05, 3.10814000e+06,
4.44020000e+03, 8.88040000e+02, 9.67214851e+03, 5.94206851e+04,
8.48866929e+01, 1.69773386e+01, 3.65050000e+05, 5.21500000e+02,
1.04300000e+02, 7.45000000e-01, 1.49000000e-01, 2.98000000e-02,
1.84909847e+02, 1.13599060e+03, 1.62284371e+00, 3.24568742e-01,
6.97893952e+03, 9.96991360e+00, 1.99398272e+00, 1.42427337e-02,
2.84854674e-03, 5.69709349e-04, 4.28750000e+04, 6.12500000e+01,
1.22500000e+01, 8.75000000e-02, 1.75000000e-02, 3.50000000e-03,
1.25000000e-04, 2.50000000e-05, 5.00000000e-06, 1.00000000e-06]])
Upvotes: 0
Views: 799
Reputation: 79238
Use poly
eg
c(1, poly(t(X), degree = 3, raw = TRUE))
Note that the ordering will be different.
Also Note that the python code is incorrect. If X is a column, then do not transpose. in that case you will have the correct values from each language:
poly.fit_transform(X.T) # Original X before transpose
array([[1.00000000e+00, 2.98000000e+02, 8.88040000e+04, 2.64635920e+07],
[1.00000000e+00, 5.69709349e+00, 3.24568742e+01, 1.84909847e+02],
[1.00000000e+00, 3.50000000e+01, 1.22500000e+03, 4.28750000e+04],
[1.00000000e+00, 5.00000000e-02, 2.50000000e-03, 1.25000000e-04],
[1.00000000e+00, 1.00000000e-02, 1.00000000e-04, 1.00000000e-06]])
in R:
X <- c(x1, log(x1), x2, x3, x4)
cbind(intercept = 1, poly(X, 3, raw = TRUE))
intercept 1 2 3
[1,] 1 298.000000 88804.00000 2.646359e+07
[2,] 1 5.697093 32.45687 1.849098e+02
[3,] 1 35.000000 1225.00000 4.287500e+04
[4,] 1 0.050000 0.00250 1.250000e-04
[5,] 1 0.010000 0.00010 1.000000e-06
Upvotes: 1