Antalagor
Antalagor

Reputation: 438

Spark Shuffle Memory Overhead Issues

I have some recurring problems in designing Spark Jobs (using Spark 2.3.x).

In a nutshell:

I wonder how single tasks can have such a high memory consumption. In my understanding of how Spark work, it should be possible to make the tasks sufficiently small so that they fit into memory. Also the fact that few tasks make up for the majority of runtime is a sign of sub-optimal parallelization. The data within a grouping unit (group -> all that matches key for groupBy or join) is not very large. (Aggregation of a single group key cannot cause the memory issues alone)

Things I already tried:

Any ideas to improve performance & stability?

edit:

Some further investigation revealed, we have indeed very skewed datasets. It seems that the map operations for some few very large rows are very much to large for the spark-executors to handle.

Upvotes: 2

Views: 1442

Answers (1)

Antalagor
Antalagor

Reputation: 438

i increased shuffle partition count. i massively increased executor memory. and i changed the configuration settings that were recommended in the spark error logs. for now the job runs without warnings/ errors, but the runtime is severely increased.

--executor-memory 32g
--driver-memory 16g
--conf spark.executor.memoryOverhead=8g
--conf spark.driver.maxResultSize=4g
--conf spark.sql.shuffle.partitions=3000

Upvotes: 1

Related Questions