KevinTydlacka
KevinTydlacka

Reputation: 1293

How do I Authenticate a Service Account to Make Queries against a GDrive Sheet Backed BigQuery Table?

My situation is as follows:

Google Account A has some data in BigQuery.

Google Account B manages Account A's BigQuery data, and has also been given editor privileges for Account A's Cloud Platform project.

Account B has a Sheet in Google Drive that has some cool reference data in it. Account B logs into the BQ Web console, and creates a table in Account A's BQ project that is backed by this sheet.

All is well. Account B can query and join to this table successfully within Account A's BQ data from the web UI.

Problem:

Google Account A also has a service account that is an editor for Google Account A's Cloud Platform Project. This service account manages and queries the data in BQ using the python google-cloud API. When this service account attempts to query the reference table that is backed by Account B's GDrive Sheet, the job fails with this error:

Encountered an error while globbing file pattern.  JobID: "testing_gdrivesheet_query_job1"

Near as I can tell this is actually an authentication issue. How can I give Account A's service account appropriate access to Account B's GDrive so it can access that reference table?

Bonus Points: Is there any performance difference between a table backed by a GDrive Sheet vs a native BQ table?

Upvotes: 5

Views: 7535

Answers (4)

riodpp
riodpp

Reputation: 85

Just need to add step from Evan Kaeding answer. You can find airflow connection in Airflow UI menu "Admin" -> "Connections" -> choose your connection. In my case I also need to add keyfile path or keyfile JSON of your service account in the airflow connection

add json keyfile

based on this references https://cloud.google.com/composer/docs/how-to/managing/connections#creating_a_connection_to_another_project

Upvotes: 0

Evan Kaeding
Evan Kaeding

Reputation: 192

For those of you trying to do this via Airflow or Google Cloud Composer, there are two main steps you'll need to do to accomplish this.

  1. Grant view access to the spreadsheet to the [email protected]. This should be the same service account you're using to access Google BigQuery. This can be done in the Sheets GUI or programmatically.

  2. Add the following scope to your Google Cloud Connection in Airflow: airflow scope

You will then be able to query external tables that reference Google Sheets.

Upvotes: 4

Juve
Juve

Reputation: 10824

While Orbit's answer helped me to find a solution for the issue, there are a few more things you need to consider. Therefore, I like to add my detailed solution to the problem. This solution is required if Orbit's basic solution does not work, in particular, if you use the G Suite and your policies do not allow sharing sheets/docs with accounts outside of your domain. In this case you cannot directly share a doc/sheet with the service account.

Before you start:

  1. Create or select a service account in your project
  2. Enable Domain-wide Delegation (DwD) in the account settings. If not present, this generates an OAuth client ID for the service account.
  3. Make sure the delegated [email protected] has access to the sheet.
  4. Add the required scopes to your service account's OAuth client (you may need to ask a G Suite admin to do this for you):

    • https://www.googleapis.com/auth/bigquery
    • https://www.googleapis.com/auth/drive

If the delegated user can access your drive-based table in the BigQuery UI, your service account should now also be able to access it on behalf of the delegated user.

Here is a full code snippet that worked for me:

#!/usr/bin/env python

import httplib2
from google.cloud import bigquery
from oauth2client.service_account import ServiceAccountCredentials

scopes = [
    "https://www.googleapis.com/auth/drive",
    "https://www.googleapis.com/auth/bigquery",
]

delegated_user = "[email protected]"
project        = 'project-name'
table          = 'dataset-name.table-name'
query          = 'SELECT count(*) FROM [%s:%s]' % (project, table)

creds = ServiceAccountCredentials.from_json_keyfile_name('secret.json', scopes=scopes)
creds = creds.create_delegated(delegated_user)

http = creds.authorize(httplib2.Http())
client = bigquery.Client(http=http)

bq = client.run_sync_query(query)
bq.run()
print bq.fetch_data()

Note that I was not able to setup the delegation directly and needed to create an HTTP client using creds = creds.create_delegated(delegated_user) and http = creds.authorize(httplib2.Http()). The authorized HTTP client can then be used as HTTP client for the BigQuery client: client = bigquery.Client(http=http).

Also note that the service account does not need to have any predefined roles assigned in the project settings, i.e., you do not have to make it a bigquery user or even a project owner. I suppose it acquires access primarily via delegation.

Upvotes: 5

orbit
orbit

Reputation: 76

You should be able to get this working with the following steps:

First share the sheet with the email/"service account id" associated with the service account.

Then you'll be able to access your sheet-backed table if you create a Client with the bigquery and drive scopes. (You might need to have domain-wide-delegation enabled on the service account).

scopes = ['https://www.googleapis.com/auth/bigquery', 'https://www.googleapis.com/auth/drive']

credentials = ServiceAccountCredentials.from_json_keyfile_name(
'<path_to_json>', scopes=scopes)

# Instantiates a client
client = bigquery.Client(project = PROJECT, credentials = credentials)

bqQuery = client.run_sync_query(q)
bqQuery.run()
bqQuery.fetch_data()

Upvotes: 5

Related Questions