赵祥宇
赵祥宇

Reputation: 497

Spark: java.io.IOException: No space left on device

Now I am learning how to use spark.I have a piece of code which can invert a matrix and it works when the order of the matrix is small like 100.But when the order of the matrix is big like 2000 I have an exception like this:

15/05/10 20:31:00 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /tmp/spark-local-20150510200122-effa/28/temp_shuffle_6ba230c3-afed-489b-87aa-91c046cadb22

java.io.IOException: No space left on device

In my program I have lots of lines like this:

val result1=matrix.map(...).reduce(...)
val result2=result1.map(...).reduce(...)
val result3=matrix.map(...)

(sorry about that because the code is to many to write there)

So I think when I do this Spark create some new rdds,and in my program Spark creates too many rdds so I have the exception.I am not sure if what I thought is correct.

How can I delete the rdds that I won't use any more?Like result1 and result2?

I have tried rdd.unpersist(), it doesn't work.

Upvotes: 10

Views: 20101

Answers (3)

rahul gulati
rahul gulati

Reputation: 188

This is because Spark create some temp shuffle files under /tmp directory of you local system.You can avoid this issue by setting below properties in your spark conf files.

Set the following properties in spark-env.sh.
(change the directories accordingly to whatever directory in your infra, that has write permissions set and with enough space in it)

SPARK_JAVA_OPTS+=" -Dspark.local.dir=/mnt/spark,/mnt2/spark -Dhadoop.tmp.dir=/mnt/ephemeral-hdfs"

export SPARK_JAVA_OPTS

You can also set the spark.local.dir property in $SPARK_HOME/conf/spark-defaults.conf as stated by @EUgene below

Upvotes: 12

Eugene
Eugene

Reputation: 11085

As a complementary, to specify default folder for you shuffle tmp files, you can add below line to $SPARK_HOME/conf/spark-defaults.conf:

spark.local.dir /mnt/nvme/local-dir,/mnt/nvme/local-dir2

Upvotes: 1

yjshen
yjshen

Reputation: 6693

According to the Error message you have provided, your situation is no disk space left on your hard-drive. However, it's not caused by RDD persistency, but shuffle which you implicitly required when calling reduce.

Therefore, you should clear your drive and make more spaces for your tmp folder

Upvotes: 4

Related Questions