Reputation:

What is the formula behind Scikit-learn PolynomialFeatures?

Whenever I am using Sklearn's Polynomial Features and converting 'X' values to make it Polynomial by this code,

Before that My X value are:-

[[ 1 11]
 [ 2 12]
 [ 3 13]
 [ 4 14]
 [ 5 15]
 [ 6 16]
 [ 7 17]
 [ 8 18]
 [ 9 19]
 [10 20]]

Note: It has multiple X values that mean it has more than one independent variable

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
print(X_poly)

Sklearn is returning this matrix having more columns besides having all Squared values,

[[  1.   1.  11.   1.  11. 121.]
 [  1.   2.  12.   4.  24. 144.]
 [  1.   3.  13.   9.  39. 169.]
 [  1.   4.  14.  16.  56. 196.]
 [  1.   5.  15.  25.  75. 225.]
 [  1.   6.  16.  36.  96. 256.]
 [  1.   7.  17.  49. 119. 289.]
 [  1.   8.  18.  64. 144. 324.]
 [  1.   9.  19.  81. 171. 361.]
 [  1.  10.  20. 100. 200. 400.]]

I have seen this Stackoverflow Answer https://stackoverflow.com/a/51906400/12188405 when I web searched for my issue.

So can anyone please tell me a general formula OR a python code that can return that matrix respective to any degree value? In simple words, I want to make a python program that can do it having one Parameter that is a degree (which can be any value from 0 to infinity) and it will return me that Matrix-like Sklearn gives.

Upvotes: 0

Answers (3)

chowx054

Reputation: 21

This piqued my interest while I was working on similar problem. To expand on @Reza Soltani response, the PolynomialFeatures(d) uses itertools.combinations_with_replacement or similar combinations functions to loop through degree from 1 to "d":

import math
import itertools

X = [2,3,4]
degree = 3
res = []

for i in range(1, degree+1):
    C = list(itertools.combinations_with_replacement(X, i)) 
    
    for j in range(len(C)):       
        out.res(math.prod(C[j]))

print(res)
print(len(res))

# [2, 3, 4, 4, 6, 8, 9, 12, 16, 8, 12, 16, 18, 24, 32, 27, 36, 48, 64]
# 19

# degree = 1, 3 elems: [2, 3, 4]
# degree = 2, 6 elems: [4, 6, 8, 9, 12, 16]
# degree = 3, 10 elems: [8, 12, 16, 18, 24, 32, 27, 36, 48, 64]

Upvotes: 0

Reza Soltani

Reputation: 151

I suggest you read the source code of Sklearn PolynomialFeatures in this link.

It has two different options:

interaction_only=True
- combinations('ABCD', 2) AB AC AD BC BD CD
interaction_only=False
- combinations_with_replacement('ABCD', 2) AA AB AC AD BB BC BD CC CD DD

The first one uses the combinations method of itertools package, and the second one uses combinations_with_replacement for creating new features.

Upvotes: 2

Niko Fohr

Reputation: 33770

You could use the get_feature_names() method to check the names of the columns in the returned matrix:

from sklearn.preprocessing import PolynomialFeatures
import numpy as np


X = np.arange(6).reshape(3, 2)

poly = PolynomialFeatures(10)
poly.fit(X)
poly.get_feature_names(['first', 'second'])

which will output

Out[12]:
['1',
 'first',
 'second',
 'first^2',
 'first second',
 'second^2',
 'first^3',
 'first^2 second',
 'first second^2',
 'second^3',
 'first^4',
 'first^3 second',
 'first^2 second^2',
 'first second^3',
 'second^4',
 'first^5',
 'first^4 second',
 'first^3 second^2',
 'first^2 second^3',
 'first second^4',
 'second^5',
 'first^6',
 'first^5 second',
 'first^4 second^2',
 'first^3 second^3',
 'first^2 second^4',
 'first second^5',
 'second^6',
 'first^7',
 'first^6 second',
 'first^5 second^2',
 'first^4 second^3',
 'first^3 second^4',
 'first^2 second^5',
 'first second^6',
 'second^7',
 'first^8',
 'first^7 second',
 'first^6 second^2',
 'first^5 second^3',
 'first^4 second^4',
 'first^3 second^5',
 'first^2 second^6',
 'first second^7',
 'second^8',
 'first^9',
 'first^8 second',
 'first^7 second^2',
 'first^6 second^3',
 'first^5 second^4',
 'first^4 second^5',
 'first^3 second^6',
 'first^2 second^7',
 'first second^8',
 'second^9',
 'first^10',
 'first^9 second',
 'first^8 second^2',
 'first^7 second^3',
 'first^6 second^4',
 'first^5 second^5',
 'first^4 second^6',
 'first^3 second^7',
 'first^2 second^8',
 'first second^9',
 'second^10']

Upvotes: 0

What is the formula behind Scikit-learn PolynomialFeatures?

Answers (3)

Related Questions