How to overwrite a table in atomic fashion in BigQuery client Python API

Question

Here is a code snippet I used as a reference from GCP documentation:

job_config = bigquery.QueryJobConfig()
# Set the destination table
table_ref = client.dataset(dataset_id).table('your_table_id')
job_config.destination = table_ref
sql = """
    SELECT corpus
    FROM `bigquery-public-data.samples.shakespeare`
    GROUP BY corpus;
"""

# Start the query, passing in the extra configuration.
query_job = client.query(
    sql,
    # Location must match that of the dataset(s) referenced in the query
    # and of the destination table.
    location='US',
    job_config=job_config)  # API request - starts the query

query_job.result()  # Waits for the query to finish
print('Query results loaded to table {}'.format(table_ref.path))

This works fine but if the table already exists, this spits out an error. I know how to delete the table first but I am wondering if there is a way to make this overwrite the table in atomic fashion so that the table always exists.

shollyman · Accepted Answer

You can control how results are persisted through a combination of setting the create_disposition and write_disposition. The python library exposes these options in QueryJobConfig, and links to more details from the REST API docs.

For queries, the default behavior for write dispostion is WRITE_EMPTY, which causes a failure if the table already exists. Switching that to WRITE_TRUNCATE should give you the atomic replacement of data that you're looking for.

TL;DR: just add this to your job config:

job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE

How to overwrite a table in atomic fashion in BigQuery client Python API

Answers (1)

Related Questions