In Spark MLlib, How to save the BisectingKMeansModel with Python to HDFS?

Question

In Spark MLlib, BisectingKMeansModel in pyspark have no save/load function. why? How to save or load the BisectingKMeans Model with Python to HDFS ?

ylrax · Accepted Answer

It may be your spark version. For bisecting k_means is recommended to have above 2.1.0.

You can find a complete example here on the class pyspark.ml.clustering.BisectingKMeans, hope it helps:

The last part of the example code include a model save/load:

model_path = temp_path + "/bkm_model"
model.save(model_path)
model2 = BisectingKMeansModel.load(model_path)

It works for hdfs as well, but make sure that temp_path/bkm_model folder does not exist before saving the model or it will give you an error:

(java.io.IOException: Path /bkm_model already exists)

Answers (1)