Reputation: 475
I am getting the below error . The Spark_local_dir has been set and has enough space and inodes left.
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at org.xerial.snappy.SnappyOutputStream.dumpOutput(SnappyOutputStream.java:294)
at org.xerial.snappy.SnappyOutputStream.compressInput(SnappyOutputStream.java:306)
at org.xerial.snappy.SnappyOutputStream.rawWrite(SnappyOutputStream.java:245)
at org.xerial.snappy.SnappyOutputStream.write(SnappyOutputStream.java:107)
at org.apache.spark.io.SnappyOutputStreamWrapper.write(CompressionCodec.scala:190)
at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:218)
at org.apache.spark.util.collection.ChainedBuffer.read(ChainedBuffer.scala:56)
at org.apache.spark.util.collection.PartitionedSerializedPairBuffer$$anon$2.writeNext(PartitionedSerializedPairBuffer.scala:137)
at org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:757)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
cat spark-env.sh |grep -i local
export SPARK_LOCAL_DIRS=/var/log/hadoop/spark
disk usage df -h /var/log/hadoop/spark Filesystem Size Used Avail Use% Mounted on /dev/mapper/meta 200G 1.1G 199G 1% /var/log/hadoop
inodes df -i /var/log/hadoop/spark Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/meta 209711104 185 209710919 1% /var/log/hadoop
Upvotes: 1
Views: 5917
Reputation: 5185
Please check how many inodes were used by hadoop. If they all have gone, the generic error would be the same, no space left, while there is still a space.
Upvotes: 0
Reputation: 704
I also encountered the same issue. To resolve it, I first checked my hdfs disk usage by running hdfs dfsadmin -report
.
The Non DFS Used
column was above 250 GB. This implied that my logs or tmp or intermediate data was consuming too much space.
After running du -lh | grep G
from root
folder I figured that spark/work
was consuming over 200 GB.
After looking at the folders inside spark/work
I understood that by mistake I forgot to uncomment System.out.println
statement and hence the logs were consuming high space.
Upvotes: 3
Reputation: 15141
If you're running YARN in yarn-cluster
mode then the local dirs used by both Spark executors and driver will be taken from YARN config (yarn.nodemanager.local-dirs
). spark.local.dir
and your env variable will be ignored.
If you're running YARN in yarn-client
mode then the executors will use the local dirs configured the in the YARN config again but the driver will use the one you specified in your env variable because in that mode the driver is not ran on the YARN cluster.
So try setting that config.
You can find a bit more information in the documentation
And there's even a whole section on running spark on yarn
Upvotes: 1