Reputation: 145
I have following file structure if i do df -h in any of the slaves or master device
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 7.9G 4.4G 3.5G 57% /
tmpfs 7.4G 4.0K 7.4G 1% /dev/shm
/dev/xvdb 37G 3.3G 32G 10% /mnt
/dev/xvdf 37G 2.0G 34G 6% /mnt2
/dev/xvdv 500G 33M 500G 1% /vol0
My spark-env.sh looks like this
export SPARK_WORKER_DIR="/vol0"
export SPARK_WORKER_CORES=2
export SPARK_WORKER_OPTS="-Dspark.local.dir=/vol0"
export SPARK_LOCAL_DIRS="/vol0/"
But Still i am getting "No space left on device" error and job gets terminated during saving files.
I Have one dataset having 200 files each of 1GB each and another dataset having 200 files each of 45MB. I am joining them and saving to new file in s3.
DataFrame dataframe1= sqlContext.read().format("json").json(outputGrowth).coalesce(50);
dataframe1.registerTempTable("dataframe1");
DataFrame dataframe2 = sqlContext.read().format("json").json(pdiPath);
dataframe2.registerTempTable("dataframe2");
//Joining two tables
String query = "join dataframe1 and dataframe2";
DataFrame resultPDI = sqlContext.sql(query);
dataPDI.unpersist();
growthData.unpersist();
resultPDI.write().mode("overwrite").json(outputPDI);
So, How can i set my spark to store data in /vol0 instead of other file /mnt/spark.
I have tried different solutions from stackoverflow, and some blogs no any solutions are working for me.
Can anyone help me to get rid of this problem. i am using 10 device m1.large in Aws server.
Upvotes: 2
Views: 523
Reputation: 3692
You can create cimply create the symlink for /mnt/spark to /val0 if you are using ubuntu as below
ln -s /val0 /mnt/spark
so with the help of symblink you can refer /val0 with /mnt/spark. For more info go to below link http://ubuntuhak.blogspot.in/2013/04/symbolic-links-in-ubuntu.html
Upvotes: 1