Reputation: 840
I am beginner to Spark and I am running my application to read 14KB data from text filed, do some transformations and actions(collect, collectAsMap) and save data to Database
I am running it locally in my macbook with 16G memory, with 8 logical cores.
Java Max heap is set at 12G.
Here is the command I use to run the application.
bin/spark-submit --class com.myapp.application --master local[*] --executor-memory 2G --driver-memory 4G /jars/application.jar
I am getting the following warning
2017-01-13 16:57:31.579 [Executor task launch worker-8hread] WARN org.apache.spark.storage.MemoryStore - Not enough space to cache rdd_57_0 in memory! (computed 26.4 MB so far)
Can anyone guide me on what is going wrong here and how can I improve performance? Also how to optimize on suffle-spill ? Here is a view of the spill that happens in my local system
Upvotes: 16
Views: 73387
Reputation: 7990
Running executors with too much memory often results in excessive garbage collection delays. So it is not a good idea to assign more memory. Since you have only 14KB data 2GB executors memory and 4GB driver memory is more than enough. There is no use of assigning this much memory. You can run this job with even 100MB memory and performance will be better then 2GB.
Driver memory are more useful when you run the application, In yarn-cluster mode, because the application master runs the driver. Here you are running your application in local mode driver-memory
is not necessary. You can remove this configuration from you job.
In your application you have assigned
Java Max heap is set at: 12G.
executor-memory: 2G
driver-memory: 4G
Total memory allotment= 16GB and your macbook having 16GB only memory. Here you have allocated total of your RAM memory to your spark application.
This is not good. Operating system itself consume approx 1GB memory and you might have running other applications which also consume the RAM memory. So here you are actually allocating more memory then you have. And this is the root cause that your application is throwing error Not enough space to cache the RDD
executor-memory 1G
or lessdriver-memory
from your configuration.Submit your job. It will run smoothly.
If you are very keen to know spark memory management techniques, refer this useful article.
Spark on yarn executor resource allocation
Upvotes: 19
Reputation: 155
In local mode,you don't need to specify master,useing default arguments is ok. The official website said,"The spark-submit script in Spark’s bin directory is used to launch applications on a cluster. It can use all of Spark’s supported cluster managers through a uniform interface so you don’t have to configure your application specially for each one.".So you'd better use spark-submit in cluster,locally you can use spark-shell.
Upvotes: -1