Reputation: 3124
When updating a view, indirect schema changes do not seem to be picked up.
view1
with a field field1
(e.g. SELECT 1 AS field1
)view2
selecting all fields from view1
view1
to also include field2
(e.g. SELECT 1 AS field1, 2 AS field2
)view2
with the same query as before (due to documented limitation)view1
and view2
including field1
and field2
view1
correctly updated (including field1
and field2
)view2
is only including field1
view2
does actually return field1
and field2
I could delete view2
and re-create it again but that wouldn't be atomic and there would be a time where the view is not available which is undesired.
I also tried to update the schema property of view2
but that is refused with Cannot add fields (field: field2)
:
google.api_core.exceptions.BadRequest: 400 PATCH https://www.googleapis.com/bigquery/v2/projects/<project-id>/datasets/dataset1/tables/view2: Provided Schema does not match Table <project-id>:dataset1.view2. Cannot add fields (field: field2)
Is there any way to update the view atomically while also updating the schema which was changed indirectly (a table/view the view selects from).
Note: of course my view2 would add additional fields and I could currently determine its schema by creating a new temporary view.
Note: the schema is important because tools such as the Data Studio's BigQuery connector are inspecting the schema.
# Python 3.6+
import google.api_core.exceptions
from google.cloud import bigquery
def delete_table_if_exists(client: bigquery.Client, table: bigquery.Table):
try:
client.delete_table(table)
except google.api_core.exceptions.NotFound:
pass
def full_table_id(table: bigquery.Table) -> str:
# Note: the documentation says it should be separated by a dot but uses a colon
return table.full_table_id.replace(':', '.')
def view_test():
client = bigquery.Client()
dataset_ref = client.dataset('dataset1')
try:
client.create_dataset(dataset_ref)
except google.api_core.exceptions.Conflict:
pass
view1 = bigquery.Table(dataset_ref.table('view1'))
view2 = bigquery.Table(dataset_ref.table('view2'))
delete_table_if_exists(client, view1)
delete_table_if_exists(client, view2)
view1.view_query = 'SELECT 1 AS field1'
view1 = client.create_table(view1)
view2.view_query = f'SELECT * FROM `{full_table_id(view1)}`'
client.create_table(view2)
view1.view_query = 'SELECT 1 AS field1, 2 AS field2'
client.update_table(view1, ['view_query'])
client.update_table(view2, ['view_query'])
print('view2 schema:', client.get_table(view2).schema)
# trying to update the schema fails with 'Cannot add fields (field: field2)'
view2.schema = client.get_table(view1).schema
client.update_table(view2, ['schema'])
if __name__ == '__main__':
view_test()
#!/bin/bash
set -e
project_id=$(gcloud config list --format 'value(core.project)' 2>/dev/null)
bq mk -f dataset1
bq rm -f dataset1.view1
bq rm -f dataset1.view2
bq mk --use_legacy_sql=false --view 'SELECT 1 AS field1' dataset1.view1
bq mk --use_legacy_sql=false --view 'SELECT * FROM `'$project_id'.dataset1.view1`' dataset1.view2
bq update --use_legacy_sql=false --view 'SELECT 1 AS field1, 2 AS field2' dataset1.view1
bq update --use_legacy_sql=false --view 'SELECT * FROM `'$project_id'.dataset1.view1`' dataset1.view2
bq show dataset1.view2
def get_create_or_replace_view_query(view: bigquery.Table) -> str:
return f'CREATE OR REPLACE VIEW {view.dataset_id}.{view.table_id} AS {view.view_query}'
def view_test():
# ...
query_job = client.query(get_create_or_replace_view_query(view2))
query_job.result()
print('view2 schema:', client.get_table(view2).schema)
bq query --use_legacy_sql=false 'CREATE OR REPLACE VIEW dataset1.view2 AS SELECT * FROM `'$project_id'.dataset1.view1`'
Upvotes: 7
Views: 7593
Reputation: 33705
You should use a CREATE OR REPLACE VIEW
statement; see the related documentation. BigQuery provides ACID semantics for all queries that perform table modifications, and CREATE OR REPLACE VIEW
is no exception, so this replaces the definition and schema of the view atomically.
Upvotes: 8