Reputation: 646
I'm working on AWS EC2 instance where I installed Spark 2.2.0 and I have 8 GB of RAM and 2 cores.
I was following this tutorial to play a little with pyspark shell:
https://sparkour.urizone.net/recipes/managing-clusters/
I started the master and I started one slave worker and they show up on the web ui.
However, in the shell, when I try to execute a command like:
>>> tf = spark.sparkContext.textFile('README.md')
>>> tf.count()
I get this:
[Stage 0:> (0 + 0) / 2]
17/08/29 11:02:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
In my spark-env.sh, I set variables like this:
SPARK_LOCAL_IP=127.0.0.1
SPARK_MASTER_HOST=127.0.0.1
SPARK_WORKER_INSTANCES=2
SPARK_WORKER_MEMORY=1000m
SPARK_WORKER_CORES=1
So, I don't know why there is a problem. The pyspark shell doesn't reach the worker slave properly I guess.
Upvotes: 1
Views: 480
Reputation: 2094
In this setup I would start spark with settings like this:
spark-shell (or spark-submit) --master local[*] --driver-memory 4G ...
From one of my comments:
With such a small machine, I suspect you won't be able to run on cluster mode. The thing is that the spark driver takes resources as well as the two other workers. In this scenario, you have 1 core driver + 2 workers * 1 core . You could try to resize down the number of workers to 1 and that should work.
Upvotes: -1