Reputation: 1221
I am facing the java.lang.OutOfMemoryError: Java Heap Space
issue when I run the same spark program every 2nd time.
Here is a scenario:
When I do the spark-submit
and runs the spark program for the first time, it gives me the correct output & everything is fine. When I execute the same spark-submit
one more time, it is throwing java.lang.OutOfMemoryError: Java Heap Space
exception.
When it again works?
If I run the same spark-submit
after clearing the linux cache by executing - /proc/sys/vm/drop_caches
it again runs successfully for one single time.
I tried setting all possible spark configs like memoryOverhead, drive-memory, executor-memory, etc.
Any idea whats happening here? Is this really a problem with spark code, or its happening because of some linux machine setting or the way cluster is configured?
Thanks.
Upvotes: 0
Views: 482
Reputation: 920
In case of using df.persist()
or df.cache()
then you should be also using df.unpersist()
method and there's also sqlContext.clearCache()
which clears all.
Upvotes: 0