Spark Shuffle Memory Overhead Issues

Question

I have some recurring problems in designing Spark Jobs (using Spark 2.3.x).

In a nutshell:

job is essentially some expensive shuffle operation ( .groupby or .join operations on large dataframes with fine granularity). Afterwards results are written to disk (parquet)
most tasks succeed very quickly
there are few, hard to solve tasks, that take very long and sometimes fail
even if the job succeeds the few long task make up for the majority of runtime
Yarn occasionally kills some executors because they exceed memory limits yarn.YarnAllocator: Container killed by YARN for exceeding memory limits. 18.0 GB of 18 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
Yarn occasionally kills the job after those tasks failed multiple times org.apache.spark.SparkException: Job aborted due to stage failure: Task 175 in stage 92.0 failed 4 times

I wonder how single tasks can have such a high memory consumption. In my understanding of how Spark work, it should be possible to make the tasks sufficiently small so that they fit into memory. Also the fact that few tasks make up for the majority of runtime is a sign of sub-optimal parallelization. The data within a grouping unit (group -> all that matches key for groupBy or join) is not very large. (Aggregation of a single group key cannot cause the memory issues alone)

Things I already tried:

increased executor memory and memory overhead -> reduced failrate, increased runtime, but also did not resolve the issues, I run into the next limits. also there are hardware restrictions
changed partitioning of my DataFrames -> no visible effects
increased shuffle service partitions spark.sql.shuffle.partitions -> reduced the fail rate, but also increased runtime

Any ideas to improve performance & stability?

edit:

Some further investigation revealed, we have indeed very skewed datasets. It seems that the map operations for some few very large rows are very much to large for the spark-executors to handle.

Spark Shuffle Memory Overhead Issues

Answers (1)

Related Questions