Reputation: 7977
I have a fit a PCA model in PySpark and I need to get the number of components from the model object.
from pyspark.ml.feature import PCA
pca = PCA(k=5, inputCol='features', outputCol='components')
pca_model = pca.fit(data)
I tried using pca_model.k
and pca_model.getParam('k')
but none of them gives me the number of components.
>>> pca_model.k
Param(parent='PCA_4e66a98132a4fe4ad86c', name='k', doc='the number of principal components (> 0)')
>>> pca_model.getParam('k')
Param(parent='PCA_4e66a98132a4fe4ad86c', name='k', doc='the number of principal components (> 0)')
How do I get the number of components from PySpark's PCAModel
object?
Upvotes: 2
Views: 297
Reputation: 35249
You can use its Java model:
pca_model._java_obj.getK()
or getOrDefault
method:
pca_model.getOrDefault("k")
Upvotes: 2