Anbarasu
Anbarasu

Reputation: 609

How does VectorSlicer work in Spark 2.0?

In the Spark official documentation,

VectorSlicer is a transformer that takes a feature vector and outputs a new feature vector with a sub-array of the original features. It is useful for extracting features from a vector column.

I am trying to perform data clustering and I need the important features which will contribute to the clusters better. Can I use VectorSlicer for this?

Upvotes: 1

Views: 1212

Answers (1)

user7337271
user7337271

Reputation: 1712

Does this select the important features from the set of features?

It doesn't. It literally slices the vector to select only specified indices.

and need the important features which will contribute to the clusters better.

  • If you have categorical data consider using ChiSqSelector.

  • Otherwise you can use dimensionality reduction like PCA. It won't be the same as feature selection but should provide similar benefits (keep only the most important signals, discard the rest).

Upvotes: 3

Related Questions