Get BigQuery table schema using google.cloud

Question

I can for example get BigQuery data into local python with:

import os
from google.cloud import bigquery

project_id = "example-project"
dataset_id = "exapmle_dataset"
table_id = "table_id"

os.environ["GOOGLE_CLOUD_PROJECT"] = project_id
bq = bigquery.Client()

query = "SELECT * FROM {}.{} LIMIT 5".format(dataset_id, table_id)
resp = bq.run_sync_query(query)
resp.run()
data_list = resp.rows

The result:

print(data_list)
>>> [('BEDD', '1',), ('A75', '1',), ('CE3F', '1',), ('0D8C', '1',), ('3E9C', '1',)]

How do I then go and get the schema for this table? Such that, for example

headings = ('heading1', 'heading2')
# or
schema_dict = {'fields': [{'name': 'heading1', 'type': 'STRING'}, {'name': 'heading2', 'type': 'STRING'}]}

Willian Fuks · Accepted Answer

You can use the schema method from your resp variable.

After running the query you can retrieve it:

schema = resp.schema

schema will be a list containing the definition for each column in your query.

As an example, lets say this is your query:

query = "select '1' as fv, STRUCT (1, 2) t  from `dataset.table` limit 1"

The schema will be a list containing 2 entries:

[, ]

For each object in schema, you have the methods field_type, fields, mode and name so if you run:

schema[0].field_type, schema[0].mode, schema[0].name

The result is "STRING", "NULLABLE", "fv".

As the second column is a record, then if you run:

schema[1].field_type, schema[1].mode, schema[1].name, schema[1].fields

The result is:

"RECORD", "NULLABLE", "t", [google schema 1, google schema 2]

Where google schema 1 contains the definition for the inner fields within the record.

As far as I know, there's no way of getting a dictionary as you showed in your question, which means you'll have to loop over the entries in schema and build it yourself. It should be simple though. Not sure if this is working as I haven't fully tested it but it might give you an idea on how to do it:

def extract_schema(schema_resp): l = [] for schema_obj in schema_resp: r = {} r['name'] = schema_obj.name r['type'] = schema_obj.field_type r['mode'] = schema_obj.mode if schema_obj.fields: r['fields'] = extract_schema(schema_obj.fields) l.append(r) return l

So you'd just have to run schema = extract_schema(resp.schema) and (hopefully) you'll be good to go.

Get BigQuery table schema using google.cloud

Answers (1)

Related Questions