Reputation: 15524
I just built a spark 2.0 stand-alone single node cluster on Ubuntu 14. Trying to submit a pyspark job:
~/spark/spark-2.0.0$ bin/spark-submit --driver-memory 1024m --executor-memory 1024m --executor-cores 1 --master spark://ip-10-180-191-14:7077 examples/src/main/python/pi.py
spark gives me this message:
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Here is the complete output:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/07/27 17:45:18 INFO SparkContext: Running Spark version 2.0.0
16/07/27 17:45:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/27 17:45:18 INFO SecurityManager: Changing view acls to: ubuntu
16/07/27 17:45:18 INFO SecurityManager: Changing modify acls to: ubuntu
16/07/27 17:45:18 INFO SecurityManager: Changing view acls groups to:
16/07/27 17:45:18 INFO SecurityManager: Changing modify acls groups to:
16/07/27 17:45:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); groups with view permissions: Set(); users with modify permissions: Set(ubuntu); groups with modify permissions: Set()
16/07/27 17:45:19 INFO Utils: Successfully started service 'sparkDriver' on port 36842.
16/07/27 17:45:19 INFO SparkEnv: Registering MapOutputTracker
16/07/27 17:45:19 INFO SparkEnv: Registering BlockManagerMaster
16/07/27 17:45:19 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-e25f3ae9-be1f-4ea3-8f8b-b3ff3ec7e978
16/07/27 17:45:19 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
16/07/27 17:45:19 INFO SparkEnv: Registering OutputCommitCoordinator
16/07/27 17:45:19 INFO log: Logging initialized @1986ms
16/07/27 17:45:19 INFO Server: jetty-9.2.16.v20160414
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@4674e929{/jobs,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@1adab7c7{/jobs/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@26296937{/jobs/job,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@7ef4a753{/jobs/job/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@1f282405{/stages,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@5083cca8{/stages/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@3d8e675e{/stages/stage,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@661b8183{/stages/stage/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@384d9949{/stages/pool,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@7665e464{/stages/pool/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@381fc961{/storage,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@2325078{/storage/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@566116a6{/storage/rdd,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@f7e9eca{/storage/rdd/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@496c0a85{/environment,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@59cd2240{/environment/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@747dbf9{/executors,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@7c349d15{/executors/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@55259834{/executors/threadDump,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@65ca7ff2{/executors/threadDump/json,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@5c6be8a1{/static,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@4ef1a0c{/,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@7df2d69d{/api,null,AVAILABLE}
16/07/27 17:45:19 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@4b71033e{/stages/stage/kill,null,AVAILABLE}
16/07/27 17:45:19 INFO ServerConnector: Started ServerConnector@646986bc{HTTP/1.1}{0.0.0.0:4040}
16/07/27 17:45:19 INFO Server: Started @2150ms
16/07/27 17:45:19 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/07/27 17:45:19 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.180.191.14:4040
16/07/27 17:45:19 INFO Utils: Copying /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py to /tmp/spark-ee1ceb06-a7c4-4b18-8577-adb02f97f31e/userFiles-565d5e0b-5879-40d3-8077-d9d782156818/pi.py
16/07/27 17:45:19 INFO SparkContext: Added file file:/home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py at spark://10.180.191.14:36842/files/pi.py with timestamp 1469641519759
16/07/27 17:45:19 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://ip-10-180-191-14:7077...
16/07/27 17:45:19 INFO TransportClientFactory: Successfully created connection to ip-10-180-191-14/10.180.191.14:7077 after 25 ms (0 ms spent in bootstraps)
16/07/27 17:45:20 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20160727174520-0006
16/07/27 17:45:20 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39047.
16/07/27 17:45:20 INFO NettyBlockTransferService: Server created on 10.180.191.14:39047
16/07/27 17:45:20 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.180.191.14, 39047)
16/07/27 17:45:20 INFO BlockManagerMasterEndpoint: Registering block manager 10.180.191.14:39047 with 366.3 MB RAM, BlockManagerId(driver, 10.180.191.14, 39047)
16/07/27 17:45:20 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.180.191.14, 39047)
16/07/27 17:45:20 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@2bc4029c{/metrics/json,null,AVAILABLE}
16/07/27 17:45:20 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/07/27 17:45:20 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@60378632{/SQL,null,AVAILABLE}
16/07/27 17:45:20 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@6491578b{/SQL/json,null,AVAILABLE}
16/07/27 17:45:20 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@9ae3f78{/SQL/execution,null,AVAILABLE}
16/07/27 17:45:20 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@3c80379{/SQL/execution/json,null,AVAILABLE}
16/07/27 17:45:20 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@245146b3{/static/sql,null,AVAILABLE}
16/07/27 17:45:20 INFO SharedState: Warehouse path is 'file:/home/ubuntu/spark/spark-2.0.0/spark-warehouse'.
16/07/27 17:45:20 INFO SparkContext: Starting job: reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43
16/07/27 17:45:20 INFO DAGScheduler: Got job 0 (reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43) with 2 output partitions
16/07/27 17:45:20 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43)
16/07/27 17:45:20 INFO DAGScheduler: Parents of final stage: List()
16/07/27 17:45:20 INFO DAGScheduler: Missing parents: List()
16/07/27 17:45:20 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43), which has no missing parents
16/07/27 17:45:20 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.6 KB, free 366.3 MB)
16/07/27 17:45:21 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.0 KB, free 366.3 MB)
16/07/27 17:45:21 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.180.191.14:39047 (size: 3.0 KB, free: 366.3 MB)
16/07/27 17:45:21 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
16/07/27 17:45:21 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (PythonRDD[1] at reduce at /home/ubuntu/spark/spark-2.0.0/examples/src/main/python/pi.py:43)
16/07/27 17:45:21 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/07/27 17:45:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/07/27 17:45:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
I'm not running spark on top of hadoop or yarn, just by itself, stand-alone. What can I do to make spark process these jobs?
Upvotes: 1
Views: 625
Reputation: 2958
Setting master to local as suggested above will only make your program run in local mode - which is good for beginners/small loads for a single computer - but it will not be configured to run on a cluster. What you need to do in order to run your program in a real cluster (and possibly on multiple machines) is setting a master and slaves using the scripts located in:
<spark-install-dir>/start-master.sh
Your slaves (you must have at least one) should be started using:
<spark-install-dir> start-slave.sh spark://<master-address>:7077
This way you will be able to run in a real cluster mode - the UI will show you your workers and jobs etc. you will see the main UI in port 8080 on the master machine. Port 4040 on the machine that run the driver will show you the application UI. Port 8081 will show you the workers UI (if u use many slaves on the same machine the ports will be 8081 for the first, 8082 for the second etc.)
You can run as many slaves as you want from as many machines - and supply the amount of cores for each slave (it is possible to provide few slaves from the same machine - just give them the appropriate number of cores/ram - so that you will not confuse the scheduler).
Upvotes: 1
Reputation: 1044
Try setting master to local like this, in order to use local mode:
~/spark/spark-2.0.0$ bin/spark-submit --driver-memory 1024m --executor-memory 1024m --executor-cores 1 --master local[2] examples/src/main/python/pi.py
You may also need to use the
--py-files
option as well. Spark submit options
Upvotes: 2