Reputation: 347
I am trying to create my first Google Cloud Dataproc cluster using the following command:
gcloud dataproc clusters create hive-cluster \
--scopes sql-admin \
--image-version 1.3 \
--initialization-actions "gs://goog-dataproc-${PROJECT}:${REGION}:hive-metastore" \
--master-machine-type n1-standard-1 \
--master-boot-disk-size 15 \
--num-workers 2 \
--worker-machine-type n1-standard-1 \
--worker-boot-disk-size 15 \
--region us-east1 \
--zone us-east1-b
However, I get the following error:
Dataproc could not validate the initialization action using the service-owned service accounts. Cluster creation may still succeed if the initialization action is accessible from GCE VMs.
Reason: service-1456309104734317@dataproc-accounts.iam.gserviceaccount.com does not have storage.objects.get access to goog-dataproc-initialization-actions-us-east1/cloud-sql-proxy/cloud-sql-proxy.sh.
Waiting for cluster creation operation...done.
ERROR: (gcloud.dataproc.clusters.create) Operation [projects/traits-seater-824109/regions/us-east1/operations/5b36fb82-ade2-3d5f-a6bd-cb1a206bb54e] failed: Multiple Errors:
- Error downloading script 'gs://goog-dataproc-initialization-actions-us-east1/cloud-sql-proxy/cloud-sql-proxy.sh': [email protected] does not have storage.objects.get access to goog-dataproc-initialization-actions-us-east1/cloud-sql-proxy/cloud-sql-proxy.sh.
I checked the permissions in IAM and gave the storage->Object viewer roles to the service accounts mentioned in the error message above but I still get the same error. Any suggestions how to get past this error?
Upvotes: 1
Views: 2856
Reputation: 10707
There appears to be a temporary issue with permissions settings on Dataproc's regionally-hosted versions of the initialization actions -- long term these regional copies are indeed what you should be using for better isolating regional reliability of the init actions and also to avoid cross-region copying of init actions, but in the meantime, you can use the shared "global" copy of the init action instead:
gcloud dataproc clusters create hive-cluster \
--initialization-actions gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh \
...
Upvotes: 2
Reputation: 11287
The problem may come from the scopes
you provided when creating the cluster. You only restrict your cluster to access the sql-admin
API (https://www.googleapis.com/auth/sqlservice.admin).
You may need to add the storage-ro
scope (or https://www.googleapis.com/auth/devstorage.read_only) :
gcloud dataproc clusters create hive-cluster \
--scopes sql-admin,storage-ro \
[...]
Without the storage-ro
scope, even if the bucket goog-dataproc-initialization-actions-us-east1
is public, I think that the Dataproc cluster will not be able to retrieve the file from GCS.
Upvotes: 1