Ievgen Nekrashevych
Ievgen Nekrashevych

Reputation: 66

gcloud console indicating job is running, while hadoop application manager says it is finished

The job that I've submitted to spark cluster is not finishing. I see it is pending forever, however logs say that even spark jetty connector is shut down:

17/05/23 11:53:39 INFO org.spark_project.jetty.server.ServerConnector: Stopped ServerConnector@4f67e3df{HTTP/1.1}{0.0.0.0:4041}

I run latest cloud dataproc v1.1 (spark 2.0.2) on yarn. I submit spark job via gcloud api:

gcloud dataproc jobs submit spark --project stage --cluster datasys-stg \
--async --jar hdfs:///apps/jdbc-job/jdbc-job.jar --labels name=jdbc-job -- --dbType=test

The same spark pi stuff is finished correctly:

gcloud dataproc jobs submit spark --project stage --cluster datasys-stg --async \
 --class org.apache.spark.examples.SparkPi --jars file:///usr/lib/spark/examples/jars/spark-examples.jar -- 100

While visiting hadoop application manager interface I see it is finished with Successful result: hadoop application manager on port 8088 reports all is ok Google cloud console and job list is showing it is still running until killed (see job run for 20 hours before killed, while hadoop says it ran for 19 seconds): gcloud console shows it is running Is there something I can monitor to see what is preventing gcloud to finish the job?

Upvotes: 2

Views: 294

Answers (1)

Ievgen Nekrashevych
Ievgen Nekrashevych

Reputation: 66

I couldn't find anything that I can monitor my application is not finishing, but I've found the actual problem and fixed it. Turns out I had abandoned threads in my application - I had connection to RabbitMQ and that seemed to create some threads that prevented application from being finally stoped by gcloud.

Upvotes: 1

Related Questions