Reputation: 88
As described, I load a trained word2vec model through pyspark.
word2vec_model = Word2VecModel.load("saving path")
After using that, I want to delete it since it will take much memory space on single node (I used the findSynonyms function, and the doc says it should be local used only) I tried to use
del word2vec_model
gc.collect()
but it seems that doesn't word. And it's not an rdd file, I can't use .unpersist(). I didn't find any like unload() fuction in the doc.
Anyone could help me or give me some advice?
Upvotes: 0
Views: 450
Reputation: 1167
You can ensure that the object is dereferenced by the py4j gateway by running the following statement:
Given word2vec_model
a pyspark Transformer
:
spark
a SparkSession
:spark.sparkContext._gateway.detach(word2vec_model._java_obj)
sc
a SparkContext
:sc._gateway.detach(word2vec_model._java_obj)
Explanations:
Transformer
and each transformer holds an instance of JavaObject
in a private _java_obj
attribute. SparkContext
's py4j gateway.detach
method on the wrapper object (instance of JavaObject
)Upvotes: 2