Bjoern
Bjoern

Reputation: 445

Data Fusion Provisioning of Dataproc Cluster Fails

I've created a simple pipeline which reads from a SQL Server table and writes to a BigQuery table. Then I configure it to use Spark and deploy and run. It starts by provisioning the dataproc cluster and I can see that it relatively quickly creates 3 VM's, one master and two workers. The main cluster creation job stays as "provisioning" though, both in the dataproc UI and in the Data Fusion UI. After about 17 minutes it fails.

I've tried both in an enterprise instance and a basic instance. I've made sure that the instance service account has the "Cloud Data Fusion API Service Agent" role. I've run the preview, which runs in around 20 seconds and succeeds.

This is the log:

2019-06-21 10:59:37,011 - DEBUG [provisioning-service-3:i.c.c.i.p.t.ProvisioningTask@121] - Executing PROVISION subtask REQUESTING_CREATE for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
2019-06-21 10:59:42,087 - INFO  [provisioning-service-3:i.c.c.r.s.p.d.DataprocProvisioner@171] - Creating Dataproc cluster cdap-loadfromb-a7999324-9413-11e9-a296-564a3b7813c8 with system labels {goog-datafusion-version=6_0, cdap-version=6_0_1-1559673739218, goog-datafusion-edition=basic}
2019-06-21 10:59:45,446 - DEBUG [provisioning-service-3:i.c.c.i.p.t.ProvisioningTask@125] - Completed PROVISION subtask REQUESTING_CREATE for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
2019-06-21 10:59:45,461 - DEBUG [provisioning-service-3:i.c.c.i.p.t.ProvisioningTask@121] - Executing PROVISION subtask POLLING_CREATE for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
2019-06-21 10:59:46,402 - DEBUG [provisioning-service-3:i.c.c.i.p.t.ProvisioningTask@125] - Completed PROVISION subtask POLLING_CREATE for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
(...)
2019-06-21 11:17:31,345 - DEBUG [provisioning-service-3:i.c.c.i.p.t.ProvisioningTask@121] - Executing PROVISION subtask REQUESTING_DELETE for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
2019-06-21 11:17:32,753 - DEBUG [provisioning-service-3:i.c.c.i.p.t.ProvisioningTask@125] - Completed PROVISION subtask REQUESTING_DELETE for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
2019-06-21 11:17:32,769 - DEBUG [provisioning-service-3:i.c.c.i.p.t.ProvisioningTask@121] - Executing PROVISION subtask POLLING_DELETE for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
2019-06-21 11:17:33,588 - DEBUG [provisioning-service-3:i.c.c.i.p.t.ProvisioningTask@125] - Completed PROVISION subtask POLLING_DELETE for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
2019-06-21 11:17:33,601 - DEBUG [provisioning-service-3:i.c.c.i.p.t.ProvisioningTask@112] - Completed PROVISION task for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
2019-06-21 11:17:35,946 - DEBUG [provisioning-service-4:i.c.c.i.p.t.ProvisioningTask@121] - Executing DEPROVISION subtask REQUESTING_DELETE for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
2019-06-21 11:17:37,219 - ERROR [provisioning-service-4:i.c.c.i.p.t.ProvisioningTask@151] - DEPROVISION task failed in REQUESTING_DELETE state for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8.
com.google.api.gax.rpc.FailedPreconditionException: io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Cannot delete cluster 'cdap-loadfromb-a7999324-9413-11e9-a296-564a3b7813c8' while it has other pending delete operations.
    at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:59) ~[na:na]
    at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72) ~[na:na]
    at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60) ~[na:na]
    at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:95) ~[na:na]
    at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:61) ~[na:na]
    at com.google.common.util.concurrent.Futures$4.run(Futures.java:1123) ~[com.google.guava.guava-13.0.1.jar:na]
    at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:435) ~[na:na]
    at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:900) ~[com.google.guava.guava-13.0.1.jar:na]
    at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:811) ~[com.google.guava.guava-13.0.1.jar:na]
    at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:675) ~[com.google.guava.guava-13.0.1.jar:na]
    at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:492) ~[na:na]
    at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:467) ~[na:na]
    at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:41) ~[na:na]
    at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:684) ~[na:na]
    at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:41) ~[na:na]
    at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:392) ~[na:na]
    at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:475) ~[na:na]
    at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63) ~[na:na]
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:557) ~[na:na]
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:478) ~[na:na]
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:590) ~[na:na]
    at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[na:na]
    at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) ~[na:na]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_212]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_212]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_212]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_212]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_212]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_212]
    at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_212]
Caused by: io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Cannot delete cluster 'cdap-loadfromb-a7999324-9413-11e9-a296-564a3b7813c8' while it has other pending delete operations.
    at io.grpc.Status.asRuntimeException(Status.java:526) ~[na:na]
    ... 19 common frames omitted
2019-06-21 11:17:37,235 - DEBUG [provisioning-service-4:i.c.c.i.p.t.ProvisioningTask@159] - Terminated DEPROVISION task for program run program_run:default.Load_From_BIQ_v1.-SNAPSHOT.workflow.DataPipelineWorkflow.a7999324-9413-11e9-a296-564a3b7813c8 due to exception.

Upvotes: 4

Views: 2621

Answers (2)

Yasin Ozer
Yasin Ozer

Reputation: 71

Make sure the Data Fusion has default network accesses. If you have a new VPC without default network firewall rules, you might face with this problem. Basically give it a try to run Data Fusion on default VPC network with following properties.

"system.profile.properties.network=default"

Upvotes: 0

Ali Anwar
Ali Anwar

Reputation: 431

Because the Dataproc cluster remains in "provisioning", my suspicion is that the network being used for the Dataproc cluster is not configured such that the nodes of the Dataproc cluster can communicate with each other. For more information on this, see https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/network#overview.

Upvotes: 3

Related Questions