Abi2021
Abi2021

Reputation: 113

SelectKBest fclass_if Alterntive in Spark Mlib

I'm trying to convert code from Python to Scala and I was stuck in the function that exists in scikit-learn and didn't find it in Scala Spark

selector= SelectKBest(k=1).fit(X=x, y=y)

in the documentation https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html it says that the default value is Default is f_classif

Spark Mlib doc: http://spark.apache.org/docs/latest/ml-features.html#feature-selectors

Only

Is there any alternative Package that select Top k based on the Anova f-test (f_classif) in Scala?

Upvotes: 2

Views: 408

Answers (1)

d-xa
d-xa

Reputation: 524

In my view there are two options for you:

a) By just waiting until beginning of 2021 with Spark Version 3.1 Release.

Had a look into the source code and the ANOVASelector is already implemented, see:

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/ANOVASelector.scala

@Since("3.1.0")
final class ANOVASelector @Since("3.1.0")(@Since("3.1.0") override val uid: String)   

It is just not released yet.

For the release window see https://spark.apache.org/versioning-policy.html

Spark 3.1 Release Window

Date Event

Early Dec 2020 Code freeze. Release branch cut.

Mid Dec 2020 QA period. Focus on bug fixes, tests, stability and docs.

Generally, no new features merged.

Early Jan 2021 Release candidates (RC), voting, etc. until final release passes

or b) Take the source code from github and add to your code / compile latest Spark version by yourself

Latter option of course will still leave you with some work on your side...

Hope this answer can help you a bit.

Upvotes: 1

Related Questions