Reputation: 353
Currently have some confusion on the credentials/configuration used by dataflow...
From my experimentation, it seems that dataflow is always using the default configuration instead of the active configuration. Is that correct? (For example in my gcloud config
if I have a default configuration with project A while my active configuration is on project B, it seems that my dataflow job will submit to project A always. Also in this way it seems that the dataflow job is ignoring what is set in options.setProject()
, so sort of wondering when is dataflow using options.getProject()
again...?)
And also wondering is there any way that I submit dataflow job with customized configuration, say I want to submit multiple jobs to different projects with different credentials on the same run(without manually changing my gcloud config
)?
btw I am running the dataflow job on dataflow services cloud platform but submit the job from non-gce Cloudservices Account if it will make a difference.
Upvotes: 5
Views: 5256
Reputation: 1553
The code I used to have Dataflow populate its workers with the service account we wanted (in addition to Lukas answer above):
final List<String> SCOPES = Arrays.asList(
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/devstorage.full_control",
"https://www.googleapis.com/auth/userinfo.email",
"https://www.googleapis.com/auth/datastore",
"https://www.googleapis.com/auth/pubsub");
options.setGcpCredential(ServiceAccountCredentials.fromStream(new FileInputStream("key.json")).createScoped(SCOPES));
options.setServiceAccount("[email protected]");
Upvotes: 1
Reputation: 1731
Google Cloud Dataflow by default uses the application default credentials library to get the credentials if they are not specified. The library currently only supports getting the credentials using the gcloud
default configuration. Similarly, for the project, Google Cloud Dataflow uses the gcloud
default configuration.
To be able to run jobs with a different project, one can manually specify on the command-line (for example --project=myProject
, if using PipelineOptionsFactory.fromArgs) or set the option explicitly utilizing GcpOptions.setProject.
To be able to run jobs with different credentials, one can construct a credentials object and can explicitly set it utilizing GcpOptions.setGcpCredential or one can rely on using the ways that the application default credentials library supports generating the credentials object automatically which Google Cloud Dataflow is tied into. One example would be to use the environment variable GOOGLE_APPLICATION_CREDENTIALS
as explained here.
Upvotes: 6