user10058091
user10058091

Reputation: 3

In Spark MLlib, How to save the BisectingKMeansModel with Python to HDFS?

In Spark MLlib, BisectingKMeansModel in pyspark have no save/load function. why? How to save or load the BisectingKMeans Model with Python to HDFS ?

Upvotes: 0

Views: 661

Answers (1)

ylrax
ylrax

Reputation: 26

It may be your spark version. For bisecting k_means is recommended to have above 2.1.0.

You can find a complete example here on the class pyspark.ml.clustering.BisectingKMeans, hope it helps:

https://spark.apache.org/docs/2.1.0/api/python/pyspark.ml.html#pyspark.ml.clustering.BisectingKMeans%20featuresCol=%22features%22,%20predictionCol=%22prediction%22

The last part of the example code include a model save/load:

model_path = temp_path + "/bkm_model"
model.save(model_path)
model2 = BisectingKMeansModel.load(model_path)

It works for hdfs as well, but make sure that temp_path/bkm_model folder does not exist before saving the model or it will give you an error:

(java.io.IOException: Path <temp_path>/bkm_model already exists)

Upvotes: 1

Related Questions