Reputation: 1293
My situation is as follows:
Google Account A has some data in BigQuery.
Google Account B manages Account A's BigQuery data, and has also been given editor privileges for Account A's Cloud Platform project.
Account B has a Sheet in Google Drive that has some cool reference data in it. Account B logs into the BQ Web console, and creates a table in Account A's BQ project that is backed by this sheet.
All is well. Account B can query and join to this table successfully within Account A's BQ data from the web UI.
Problem:
Google Account A also has a service account that is an editor for Google Account A's Cloud Platform Project. This service account manages and queries the data in BQ using the python google-cloud API. When this service account attempts to query the reference table that is backed by Account B's GDrive Sheet, the job fails with this error:
Encountered an error while globbing file pattern. JobID: "testing_gdrivesheet_query_job1"
Near as I can tell this is actually an authentication issue. How can I give Account A's service account appropriate access to Account B's GDrive so it can access that reference table?
Bonus Points: Is there any performance difference between a table backed by a GDrive Sheet vs a native BQ table?
Upvotes: 5
Views: 7535
Reputation: 85
Just need to add step from Evan Kaeding answer. You can find airflow connection in Airflow UI menu "Admin" -> "Connections" -> choose your connection. In my case I also need to add keyfile path or keyfile JSON of your service account in the airflow connection
based on this references https://cloud.google.com/composer/docs/how-to/managing/connections#creating_a_connection_to_another_project
Upvotes: 0
Reputation: 192
For those of you trying to do this via Airflow or Google Cloud Composer, there are two main steps you'll need to do to accomplish this.
Grant view access to the spreadsheet to the [email protected]
. This should be the same service account you're using to access Google BigQuery. This can be done in the Sheets GUI or programmatically.
Add the following scope to your Google Cloud Connection in Airflow:
You will then be able to query external tables that reference Google Sheets.
Upvotes: 4
Reputation: 10824
While Orbit's answer helped me to find a solution for the issue, there are a few more things you need to consider. Therefore, I like to add my detailed solution to the problem. This solution is required if Orbit's basic solution does not work, in particular, if you use the G Suite and your policies do not allow sharing sheets/docs with accounts outside of your domain. In this case you cannot directly share a doc/sheet with the service account.
Before you start:
[email protected]
has access to the sheet.Add the required scopes to your service account's OAuth client (you may need to ask a G Suite admin to do this for you):
https://www.googleapis.com/auth/bigquery
https://www.googleapis.com/auth/drive
If the delegated user can access your drive-based table in the BigQuery UI, your service account should now also be able to access it on behalf of the delegated user.
Here is a full code snippet that worked for me:
#!/usr/bin/env python
import httplib2
from google.cloud import bigquery
from oauth2client.service_account import ServiceAccountCredentials
scopes = [
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/bigquery",
]
delegated_user = "[email protected]"
project = 'project-name'
table = 'dataset-name.table-name'
query = 'SELECT count(*) FROM [%s:%s]' % (project, table)
creds = ServiceAccountCredentials.from_json_keyfile_name('secret.json', scopes=scopes)
creds = creds.create_delegated(delegated_user)
http = creds.authorize(httplib2.Http())
client = bigquery.Client(http=http)
bq = client.run_sync_query(query)
bq.run()
print bq.fetch_data()
Note that I was not able to setup the delegation directly and needed to create an HTTP client using creds = creds.create_delegated(delegated_user)
and http = creds.authorize(httplib2.Http())
. The authorized HTTP client can then be used as HTTP client for the BigQuery client: client = bigquery.Client(http=http)
.
Also note that the service account does not need to have any predefined roles assigned in the project settings, i.e., you do not have to make it a bigquery user or even a project owner. I suppose it acquires access primarily via delegation.
Upvotes: 5
Reputation: 76
You should be able to get this working with the following steps:
First share the sheet with the email/"service account id" associated with the service account.
Then you'll be able to access your sheet-backed table if you create a Client with the bigquery and drive scopes. (You might need to have domain-wide-delegation enabled on the service account).
scopes = ['https://www.googleapis.com/auth/bigquery', 'https://www.googleapis.com/auth/drive']
credentials = ServiceAccountCredentials.from_json_keyfile_name(
'<path_to_json>', scopes=scopes)
# Instantiates a client
client = bigquery.Client(project = PROJECT, credentials = credentials)
bqQuery = client.run_sync_query(q)
bqQuery.run()
bqQuery.fetch_data()
Upvotes: 5