How to set partition count while reading CSV extract using SparkSession?

Question

I'm reading a CSV extract using sparkSession and the framework is trying to create 10K partitions/tasks because of this huge task count, the spark job was failing with java.lang.OutOfMemoryError: GC overhead limit exceeded.

Below is the sample code which is creating 10K tasks while reading the extract:

sparkSession
    .read
    .csv(dfwAbsHdfsPath)

Is there any way I can reduce or configure the tasks/partitions count? Just to add, we are using Spark version 2.3.1.

How to set partition count while reading CSV extract using SparkSession?

Answers (1)

Related Questions