clay
clay

Reputation: 20370

Spark. Simple "No space available in any of the local directories."

Here is a simple test program. This is obviously a tiny test data program.

from pyspark.sql.types import Row
from pyspark.sql.types import *
import pyspark.sql.functions as spark_functions

schema = StructType([
    StructField("cola", StringType()),
    StructField("colb", IntegerType()),
])

rows = [
    Row("alpha", 1),
    Row("beta", 2),
    Row("gamma", 3),
    Row("delta", 4)
]

data_frame = spark.createDataFrame(rows, schema)

print("count={}".format(data_frame.count()))

data_frame.write.save("s3a://my-bucket/test_data.parquet", mode="overwrite")

print("done")

This fails with:

Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: No space available in any of the local directories.
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:366)
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:416)

This is running on Amazon EMR with S3 storage. There is plenty of disk space. Can anyone explain?

Upvotes: 3

Views: 4582

Answers (1)

Jay
Jay

Reputation: 1062

I ran into the same error while using Spark 2.2 on EMR. The settings, fs.s3a.fast.upload=true and fs.s3a.buffer.dir="/home/hadoop,/tmp" (or any other folder for that matter) did not help me. It seems like my issue was related to shuffle space.

I had to add --conf spark.shuffle.service.enabled=true to my spark-submit / spark-shell to resolve this error.

Upvotes: 3

Related Questions