Reputation: 3069
My spark Job is submmited by oozie in hue. The spark is running in yarn-cluster mode. I am trying to monitor the running application status by so-called driver's 4040 port, but I cannot find the 4040 port, I check the process:
appuser 137872 137870 0 18:55 ? 00:00:00 /bin/bash -c /home/jdk/bin/java -server -Xmx4096m -Djava.io.tmpdir=/data6/data/hadoop/tmp/usercache/appuser/appcache/application_1493800575189_0547/container_1493800575189_0547_01_000004/tmp '-Dspark.driver.port=36503' '-Dspark.ui.port=0' -Dspark.yarn.app.container.log.dir=/home/log/hadoop/logs/userlogs/application_1493800575189_0547/container_1493800575189_0547_01_000004 -XX:OnOutOfMemoryError='kill %p' org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://[email protected]:36503 --executor-id 3 --hostname 10.120.117.100 --cores 1 --app-id application_1493800575189_0547 --user-class-path file:/data6/data/hadoop/tmp/usercache/appuser/appcache/application_1493800575189_0547/container_1493800575189_0547_01_000004/__app__.jar 1> /home/log/hadoop/logs/userlogs/application_1493800575189_0547/container_1493800575189_0547_01_000004/stdout 2> /home/log/hadoop/logs/userlogs/application_1493800575189_0547/container_1493800575189_0547_01_000004/stderr
appuser 138337 137872 99 18:55 ? 00:05:11 /home/jdk/bin/java -server -Xmx4096m -Djava.io.tmpdir=/data6/data/hadoop/tmp/usercache/appuser/appcache/application_1493800575189_0547/container_1493800575189_0547_01_000004/tmp -Dspark.driver.port=36503 -Dspark.ui.port=0 -Dspark.yarn.app.container.log.dir=/home/log/hadoop/logs/userlogs/application_1493800575189_0547/container_1493800575189_0547_01_000004 -XX:OnOutOfMemoryError=kill %p org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://[email protected]:36503 --executor-id 3 --hostname 10.120.117.100 --cores 1 --app-id application_1493800575189_0547 --user-class-path file:/data6/data/hadoop/tmp/usercache/appuser/appcache/application_1493800575189_0547/container_1493800575189_0547_01_000004/__app__.jar
I don't know why the spark.ui.port is 0 instead of 4040. Of course , the port 0 is not allowed by my linux system.So , I cannot monitor the application status from REST api.
Anybody can give me some suggestions?
Great thanks for the answer from Mariusz, is the below a spark ApplicationMaster process?
[appuser@hz-10-120-117-100 bin]$ ps -ef|grep ApplicationMaster
appuser 125805 125803 0 May03 ? 00:00:00 /bin/bash -c /home/jdk/bin/java -server -Xmx1024m -Djava.io.tmpdir=/data6/data/hadoop/tmp/usercache/appuser/appcache/application_1493800575189_0014/container_1493800575189_0014_01_000001/tmp -Dspark.yarn.app.container.log.dir=/home/log/hadoop/logs/userlogs/application_1493800575189_0014/container_1493800575189_0014_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class 'com.netease.ecom.data.gjs.statis.online.app.day.AppDayRealtimeStatis' --jar hdfs://datahdfsmaster/user/appuser/bjmazhengbing/jar/spark_streaming/spark-streaming-etl-2.0.jar --arg 'analysis_gjs_online.properties' --arg 'rrr' --properties-file /data6/data/hadoop/tmp/usercache/appuser/appcache/application_1493800575189_0014/container_1493800575189_0014_01_000001/__spark_conf__/__spark_conf__.properties 1> /home/log/hadoop/logs/userlogs/application_1493800575189_0014/container_1493800575189_0014_01_000001/stdout 2> /home/log/hadoop/logs/userlogs/application_1493800575189_0014/container_1493800575189_0014_01_000001/stderr
According to the official document of Spark, the driver program should have a 4040 port which is used for monitoring, but my driver program seems didn't open any port:
[appuser@hz-10-120-117-100 bin]$ netstat -ntlp|grep 125805
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
What I finally purpose of finding out the driver port is to monitor the application status. Any suggestions?
Upvotes: 4
Views: 7441
Reputation: 13926
The process you listed is an executor, not a driver.
When you run application in yarn-cluster
mode, spark driver and yarn application master run in the same JVM. So the easiest way to determine Spark UI address is to go to recource manager's UI, find your application and click a link to Application Master. This will be a proxy adress, pointing to driver's ui port.
Upvotes: 9