Reputation: 447
I've got project A
with Storage buckets A_B1
and A_B2
. Now Dataproc jobs running from project B
needs to have read access to buckets A_B1
and A_B2
. Is that possible somehow?
Motivation: project A
is production environment with production data stored in Storage. Project B
is "experimental" environment running experiment Spark jobs on production data. Goal is to obviously separate billing for production and experiment environment. Similar can be done with dev.
Upvotes: 2
Views: 1376
Reputation: 10697
Indeed, the Dataproc cluster will be acting on behalf of a service account in project "B"; generally it'll be the default GCE service account, but this is also customizable to use any other service account you create inside of project B.
You can double check the service account name by getting the details of one of the VMs in your Dataproc cluster, for example by running:
gcloud compute instances describe my-dataproc-cluster-m
It might look something like <project-number>[email protected]
. Now, in your case if you already have data in A_B1
and A_B2
you would have to recursively edit the permissions on all the contents of those buckets to add access for your service account using something like gsutil -m acl ch -r -u [email protected]:R gs://foo-bucket; while you're at it, you might also want to change the bucket's "default ACL" so that new objects also have that permission. This could get tedious to do for lots of projects, so if planning ahead, you could either:
Upvotes: 2