Google Dataproc Jobs Never Cancel, Stop, or Terminate

Question

I have been using Google Dataproc for a few weeks now and since I started I had a problem with canceling and stopping jobs.

It seems like there must be some server other than those created on cluster setup, that keeps track of and supervises jobs.

I have never had a process that does its job without error actually stop when I hit stop in the dev console. The spinner just keeps spinning and spinning.

Cluster restart or stop does nothing, even if stopped for hours.

Only when the cluster is entirely deleted will the jobs disappear... (But wait there's more!) If you create a new cluster with the same settings, before the previous cluster's jobs have been deleted, the old jobs will start on the new cluster!!!

I have seen jobs that terminate on their own due to OOM errors restart themselves after cluster restart! (with no coding for this sort of fault tolerance on my side)

How can I forcefully stop Dataproc jobs? (gcloud beta dataproc jobs kill does not work)

Does anyone know what is going on with these seemingly related issues?

Is there a special way to shutdown a Spark job to avoid these issues?

Google Dataproc Jobs Never Cancel, Stop, or Terminate

Answers (1)

Related Questions