Reputation: 113
I'm trying to convert code from Python to Scala and I was stuck in the function that exists in scikit-learn and didn't find it in Scala Spark
selector= SelectKBest(k=1).fit(X=x, y=y)
in the documentation https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html it says that the default value is Default is f_classif
Spark Mlib doc: http://spark.apache.org/docs/latest/ml-features.html#feature-selectors
Only
Is there any alternative Package that select Top k based on the Anova f-test (f_classif) in Scala?
Upvotes: 2
Views: 408
Reputation: 524
In my view there are two options for you:
a) By just waiting until beginning of 2021 with Spark Version 3.1 Release.
Had a look into the source code and the ANOVASelector is already implemented, see:
@Since("3.1.0")
final class ANOVASelector @Since("3.1.0")(@Since("3.1.0") override val uid: String)
It is just not released yet.
For the release window see https://spark.apache.org/versioning-policy.html
Spark 3.1 Release Window
Date Event
Early Dec 2020 Code freeze. Release branch cut.
Mid Dec 2020 QA period. Focus on bug fixes, tests, stability and docs.
Generally, no new features merged.
Early Jan 2021 Release candidates (RC), voting, etc. until final release passes
or b) Take the source code from github and add to your code / compile latest Spark version by yourself
Latter option of course will still leave you with some work on your side...
Hope this answer can help you a bit.
Upvotes: 1