Pranay
Pranay

Reputation: 111

Do combination of existing features make new features ?

Does it help in classifying better if I add linear, non-linear combinatinos of the existing features ? For example does it help to add mean, variance as new features computed from the existing features ? I believe that it definitely depends on the classification algorithm as in the case of PCA, the algorithm by itself generates new features which are orthogonal to each other and are linear combinations of the input features. But how does it effect in the case of decision tree based classifiers or others ?

Upvotes: 7

Views: 8610

Answers (3)

Sole Galli
Sole Galli

Reputation: 1082

There are open-source Python libraries that automate feature creation / combination:

We can automate polynomial feature creations with sklearn.

We can automatically create spline features with sklearn.

We can combine features mathematically with Feature-engine. With MathFeatures we combine feature groups, and with RelativeFeatures we combine feature pairs.

Upvotes: 0

ffriend
ffriend

Reputation: 28552

Yes, combination of existing features can give new features and help for classification. Moreover, combination of the feature with itself (e.g. polynomial from the feature) can be used as this additional data to be used during classification.

As an example, consider logistic regression classifier with such linear formula as its core:

g(x, y) = 1*x + 2*y

Imagine, that you have 2 observations:

  1. x = 6; y = 1
  2. x = 3; y = 2.5

In both cases g() will be equal to 8. If observations belong to different classes, you have no possibility to distinguish them. But let's add one more variable (feature) z, which is combination of the previous 2 features - z = x * y:

g(x, y, z) = 1*x + 2*y + 0.5*z

Now for same observations we have:

  1. x = 6; y = 1; z = 6 * 1 = 6 ==> g() = 11
  2. x = 3; y = 2.5; z = 3 * 2.5 = 7.5 ==> g() = 11.75

So now we get 2 different points and can distinguish between 2 observations.

Polynomial features (x^2, x^3, y^2, etc.) do not give additional points, but instead change the graph of the function. For example, g(x) = a0 + a1*x is a line, while g(x) = a0 + a1*x + a2*x^2 is parabola and thus can fit data much more closely.

Upvotes: 18

Lars Kotthoff
Lars Kotthoff

Reputation: 109252

In general, it's always better to have more features. Unless you have very predictive features (i.e. they allow for perfect separation of the classes to predict) already, I would always recommend adding more features. In practice, many classification algorithms (and in particular decision tree inducers) select the best features for their purposes anyway.

Upvotes: 1

Related Questions