Spark Driver does not release memory when using spark.sql.autoBroadcastJoinThreshold

Question

I have came across abnormal behaviour,

I have a query (inside loop) in which I have inner joins over 5 tables one with around 200MB and all other are under 10MB (All persisted at the start of loop, and unpersisted at the end of loop).

Whenever I use spark.sql.autoBroadcastJoinThreshold (tried default, 5MB, 1MB and 100KB), after running same query multiple times it keeps on adding driver memory and eventually fails because of out of memory ( WARN TaskMemoryManager: Failed to allocate a page (16777216 bytes), try again.)

But, If I try same thing with spark.sql.autoBroadcastJoinThreshold=-1, it works without any issues.

My Spark(2.0.0) config is :

driver memory : 10g Executor memory : 20g cores : 3 Nodes : 5

( I guess I'm giving more resources than needed, but it doesn't work even if I reduce executor memory to 4g. It processes same number of times irrespective of memory configuration. )

PS: I am not creating any broadcast variables manually.
and I am new to Spark.

Spark Driver does not release memory when using spark.sql.autoBroadcastJoinThreshold

Answers (1)

Related Questions