Update BigQuery view with indirect schema changes

Question

When updating a view, indirect schema changes do not seem to be picked up.

Steps to reproduce

Create view1 with a field field1 (e.g. SELECT 1 AS field1)
Create view2 selecting all fields from view1
Update view1 to also include field2 (e.g. SELECT 1 AS field1, 2 AS field2)
Update view2 with the same query as before (due to documented limitation)

Desired outcome

Schema for view1 and view2 including field1 and field2
View updates should be atomic

Actual outcome

Schema for view1 correctly updated (including field1 and field2)
Schema for view2 is only including field1
Selecting from view2 does actually return field1 and field2

I could delete view2 and re-create it again but that wouldn't be atomic and there would be a time where the view is not available which is undesired.

I also tried to update the schema property of view2 but that is refused with Cannot add fields (field: field2):

google.api_core.exceptions.BadRequest: 400 PATCH https://www.googleapis.com/bigquery/v2/projects//datasets/dataset1/tables/view2: Provided Schema does not match Table :dataset1.view2. Cannot add fields (field: field2)

Question

Is there any way to update the view atomically while also updating the schema which was changed indirectly (a table/view the view selects from).

Note: of course my view2 would add additional fields and I could currently determine its schema by creating a new temporary view.

Note: the schema is important because tools such as the Data Studio's BigQuery connector are inspecting the schema.

Code to reproduce the steps

# Python 3.6+
import google.api_core.exceptions
from google.cloud import bigquery


def delete_table_if_exists(client: bigquery.Client, table: bigquery.Table):
    try:
        client.delete_table(table)
    except google.api_core.exceptions.NotFound:
        pass


def full_table_id(table: bigquery.Table) -> str:
    # Note: the documentation says it should be separated by a dot but uses a colon
    return table.full_table_id.replace(':', '.')


def view_test():
    client = bigquery.Client()

    dataset_ref = client.dataset('dataset1')
    try:
        client.create_dataset(dataset_ref)
    except google.api_core.exceptions.Conflict:
        pass

    view1 = bigquery.Table(dataset_ref.table('view1'))
    view2 = bigquery.Table(dataset_ref.table('view2'))
    delete_table_if_exists(client, view1)
    delete_table_if_exists(client, view2)

    view1.view_query = 'SELECT 1 AS field1'
    view1 = client.create_table(view1)

    view2.view_query = f'SELECT * FROM `{full_table_id(view1)}`'
    client.create_table(view2)

    view1.view_query = 'SELECT 1 AS field1, 2 AS field2'
    client.update_table(view1, ['view_query'])

    client.update_table(view2, ['view_query'])
    print('view2 schema:', client.get_table(view2).schema)

    # trying to update the schema fails with 'Cannot add fields (field: field2)'
    view2.schema = client.get_table(view1).schema
    client.update_table(view2, ['schema'])


if __name__ == '__main__':
    view_test()

Bash example doing the same

#!/bin/bash

set -e

project_id=$(gcloud config list --format 'value(core.project)' 2>/dev/null)

bq mk -f dataset1

bq rm -f dataset1.view1
bq rm -f dataset1.view2

bq mk --use_legacy_sql=false --view 'SELECT 1 AS field1' dataset1.view1
bq mk --use_legacy_sql=false --view 'SELECT * FROM `'$project_id'.dataset1.view1`' dataset1.view2

bq update --use_legacy_sql=false --view 'SELECT 1 AS field1, 2 AS field2' dataset1.view1
bq update --use_legacy_sql=false --view 'SELECT * FROM `'$project_id'.dataset1.view1`' dataset1.view2

bq show dataset1.view2

Update: Code with accepted answer

Python code

def get_create_or_replace_view_query(view: bigquery.Table) -> str:
    return f'CREATE OR REPLACE VIEW {view.dataset_id}.{view.table_id} AS {view.view_query}'


def view_test():
    # ...
    query_job = client.query(get_create_or_replace_view_query(view2))
    query_job.result()
    print('view2 schema:', client.get_table(view2).schema)

Bash magic

bq query --use_legacy_sql=false 'CREATE OR REPLACE VIEW dataset1.view2 AS SELECT * FROM `'$project_id'.dataset1.view1`'

Elliott Brossard · Accepted Answer

You should use a CREATE OR REPLACE VIEW statement; see the related documentation. BigQuery provides ACID semantics for all queries that perform table modifications, and CREATE OR REPLACE VIEW is no exception, so this replaces the definition and schema of the view atomically.