Anuj
Anuj

Reputation: 1014

Unable to Write to bigquery - Permission denied: Apache Beam Python - Google Dataflow

I have been using apache beam python sdk using google cloud dataflow service for quite some time now.

I was setting dataflow up for a new project.

The dataflow pipeline

  1. Reads data from google datastore
  2. Processes it
  3. Writes to Google Big-Query.

I have similar pipelines running on other projects which are running perfectly fine.

Today, When I started a dataflow job, the pipeline started, read data from datastore, processed it and when it was about to write it to bigquery, It resulted in

apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
Dataflow pipeline failed. State: FAILED, Error:
Workflow failed. Causes: S04:read from datastore/GroupByKey/Read+read 
from datastore/GroupByKey/GroupByWindow+read from datastore/Values+read 
from datastore/Flatten+read from datastore/Read+convert to table 
rows+write to bq/NativeWrite failed., BigQuery import job 
"dataflow_job_8287310405217525944" failed., BigQuery creation of import 
job for table "TableABC" in dataset "DatasetABC" in project "devel- 
project-abc" failed., BigQuery execution failed., Error:
Message: Access Denied: Dataset devel-project-abc:DatasetABC: The user 
[email protected] does not 
have bigquery.tables.create permission for dataset devel-project- 
abc:DatasetABC: HTTP Code: 403

I made sure all the required API are enabled. According to me the service account has the necessary permission.

My question is Where this might be going wrong?

Update

From what I remember on previous projects (3 different projects to be precise) I didn't give the dataflow service agent any specific permission. The compute engine service agent had permissions like dataflow admin, editor, dataflow viewer. Hence before proceeding with giving the service agent permissions related to bigquery, i would like to know why the environment is behaving differently than the previous projects.

Is there any permission/policy changes/updates that went live in last few months resulting in requirement of bigquery writer permission?

Upvotes: 4

Views: 3233

Answers (3)

Soliman
Soliman

Reputation: 1204

Your question is not clear!? If you are asking Why data flow can't write to BigQuery? then the answer should be related to the permissions you gave to the service-account you are using. Check Michael Moursalimov answer...

But, If you are asking What the deferent between your old project and the new one? Then I can't answer, ask GCP support, or just spend more time comparing your settings for both projects.

Upvotes: -1

Amruth Bahadursha
Amruth Bahadursha

Reputation: 74

You can find the capabilities for each role for BigQuery here. If your previous projects were using primitive IAM roles, then you might need to set correctly. IAM Release Notes page is provided here which provides additional information on the updates done to the system.

Upvotes: 0

Michael Moursalimov
Michael Moursalimov

Reputation: 306

Please make sure your service account ('[email protected]') has 'roles/bigquery.dataEditor' role in 'devel-project-abc:DatasetABC'. Also make sure 'BigQuery Data Editor' role is enabled for your project.

GCP IAM is where you can check those.

Upvotes: 2

Related Questions