Clock Slave
Clock Slave

Reputation: 7977

PySpark PCA: get number of components from model object

I have a fit a PCA model in PySpark and I need to get the number of components from the model object.

from pyspark.ml.feature import PCA
pca = PCA(k=5, inputCol='features', outputCol='components')
pca_model = pca.fit(data)

I tried using pca_model.k and pca_model.getParam('k') but none of them gives me the number of components.

>>> pca_model.k
Param(parent='PCA_4e66a98132a4fe4ad86c', name='k', doc='the number of principal components (> 0)')
>>> pca_model.getParam('k')
Param(parent='PCA_4e66a98132a4fe4ad86c', name='k', doc='the number of principal components (> 0)')

How do I get the number of components from PySpark's PCAModel object?

Upvotes: 2

Views: 297

Answers (1)

Alper t. Turker
Alper t. Turker

Reputation: 35249

You can use its Java model:

pca_model._java_obj.getK()

or getOrDefault method:

pca_model.getOrDefault("k")

Upvotes: 2

Related Questions