Zhefu PENG
Zhefu PENG

Reputation: 88

How to free the memory taken by a pyspark model (JavaModel)?

As described, I load a trained word2vec model through pyspark.

word2vec_model = Word2VecModel.load("saving path")

After using that, I want to delete it since it will take much memory space on single node (I used the findSynonyms function, and the doc says it should be local used only) I tried to use

del word2vec_model
gc.collect()

but it seems that doesn't word. And it's not an rdd file, I can't use .unpersist(). I didn't find any like unload() fuction in the doc.

Anyone could help me or give me some advice?

Upvotes: 0

Views: 450

Answers (1)

ebonnal
ebonnal

Reputation: 1167

You can ensure that the object is dereferenced by the py4j gateway by running the following statement:

Given word2vec_model a pyspark Transformer:

  • Given spark a SparkSession:
spark.sparkContext._gateway.detach(word2vec_model._java_obj)
  • ... or given sc a SparkContext:
sc._gateway.detach(word2vec_model._java_obj)

Explanations:

  1. Access underlying wrapper object: Your model is a pyspark Transformer and each transformer holds an instance of JavaObject in a private _java_obj attribute.
  2. Access the SparkContext's py4j gateway.
  3. Use the gateway's detach method on the wrapper object (instance of JavaObject)

Upvotes: 2

Related Questions