Reputation: 13
I have three Ubuntu VMs (clones) in my local machine which i wanted to use to make a simple cluster. One VM to be used as a master and the other two as slaves. I can ssh every VM from every other one succesfully and i have the ip's of the two slaves in the conf/slaves file of the master and the master's ip in the spark-env.sh of every VM.When I run
start-slave.sh spark://master-ip:7077
from the slaves,they appear in the spark UI. But when i try to run things in parallel i always get the message about the resources. For testing code i use the scala shell
spark-shell --master://master-ip:7077
and sc.parallelize(1 until 10000).count
.
Upvotes: 1
Views: 5862
Reputation: 770
Do You mean that warn: WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster ui to ensure that workers are registered and have sufficient memory
This message will pop up any time an application is requesting more resources from the cluster than the cluster can currently provide.
Spark is only looking for two things: Cores and Ram. Cores represents the number of open executor slots that your cluster provides for execution. Ram refers to the amount of free Ram required on any worker running your application.
Note for both of these resources the maximum value is not your System's max, it is the max as set by the your Spark configuration.
If you need to run multiple Spark apps simultaneously then you’ll need to adjust the amount of cores being used by each app.
If you are working with applications on the same node you need to assign cores to each application to make them work in parallel: ResourceScheduling
If you use VMs (as in your situation): assign only one core to each VM when you first create it or whatever relevant to your system resource capacity as by now spark request
4 cores for each * 2 VMs = 8 core
which you don't have.
This is a tutorial i find that could help you: Install Spark on Ubuntu: Standalone Cluster Mode
Further Reading: common-spark-troubleshooting
Upvotes: 2