Murali
Murali

Reputation: 65

Spark No space left on device

I have an EMR job which reads around 1TB data, filters it and does repartition on it (there are some joins after repartition), however my job fails at repartition with error "No space left on device". I tired to change the "spark.local.dir" but its of no use. My job completes only on d2.4xlarge instance but it fails on r3.4xlarge which has similar core and ram. I couldn't find the root cause of this issue. Any help would be appreciated.

Thank you for your time.

Upvotes: 1

Views: 1256

Answers (1)

runwuf
runwuf

Reputation: 1717

I had the same issue on Spark 2.2 before. I was able to change the directory by setting SPARK_LOCAL_DIRS=/path/to/other/tmp in $SPARK_HOME/conf/spark-env.sh

"spark.local.dir /tmp
Directory to use for "scratch" space in Spark, including map output files and RDDs that get stored on disk. This should be on a fast, local disk in your system. It can also be a comma-separated list of multiple directories on different disks. NOTE: In Spark 1.0 and later this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the cluster manager." https://spark.apache.org/docs/latest/configuration.html

Upvotes: 1

Related Questions