Reputation: 2763
A BigQuery table has schema which can be viewed in the web UI, updated, or used to load data with the bq
tool as a JSON file. However, I can't find a way to dump this schema from an existing table to a JSON file (preferably from the command-line). Is that possible?
Upvotes: 128
Views: 165413
Reputation: 1628
IF you want to do this from google cloud-console
then a short SQL
query can achieve this.
It'll give you all the info from schema and you can change the STRUCT( ... )
with https://cloud.google.com/bigquery/docs/information-schema-column-field-paths#schema as you wish.
Alternatively use INFORMATION_SCHEMA.<something>
with other views to get different meta info to JSON.
As @Michel Hua said in their answer, select Query results -> JSON
in bigquery to get JSON after running the SQL query
SELECT table_name, ARRAY_AGG(STRUCT(column_name, data_type, description)) as columns
FROM `your-project-id`.your_dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
WHERE table_name = 'your_table_name'
GROUP BY table_name
Upvotes: 0
Reputation: 1371
The following bash script & sql always helped me solve the problem to extract all tables schema to JSON file from a dataset:
#!/bin/bash
#gen-default-schema.sh
input=$1
source_type=$2
result=tables_${source_type}.result
bq query --format=csv --use_legacy_sql=false --flagfile=$input | awk '{if(NR>1)print}' > $result
while IFS= read -r line
do
tbl_name=`echo "$line" | awk -F. '{print $NF}'`
schema_file=`echo "$tbl_name" | cut -d'_' -f2-`.schema
echo $schema_file
bq show --schema --format=prettyjson $line > ./temp/${source_type}/${schema_file}
echo "done"
done < "$result"
Input file example.sql
($1)
SELECT
table_catalog || ":" || table_schema || "." || table_name
FROM (
SELECT
table_catalog,
table_schema,
table_name
FROM
`project-id`.<dataset_id>.INFORMATION_SCHEMA.TABLES
ORDER BY
table_name ASC )
To run:
$bash gen-default-schema.sh example.sql example
This will place all the JSON schema under ./temp
folder
Upvotes: 2
Reputation: 3923
Upvotes: 47
Reputation: 21
As of 15th May 2022, this worked:
Upvotes: 2
Reputation: 1720
You can add the flag --schema
[1] in order to avoid table data information.
bq show --schema --format=prettyjson [PROJECT_ID]:[DATASET].[TABLE] > [SCHEMA_FILE]
bq show --schema --format=prettyjson myprojectid:mydataset.mytable > /tmp/myschema.json
[1] https://cloud.google.com/bigquery/docs/managing-table-schemas
Upvotes: 127
Reputation: 1777
Answer update
Since October 2020,
you can also run a SQL query on INFORMATION_SCHEMA.COLUMNS
which is kind of an introspective functionality.
SELECT *
FROM <YOUR_DATASET>.INFORMATION_SCHEMA.COLUMNS
and nest the data using an aggregation function such as
SELECT table_name, ARRAY_AGG(STRUCT(column_name, data_type)) as columns
FROM <YOUR_DATASET>.INFORMATION_SCHEMA.COLUMNS
GROUP BY table_name
The are also interesting metadata in INFORMATION_SCHEMA.VIEWS
if you also need the source code from your views.
Then hit save results / JSON from the BigQuery interface, or wrap it into the bq query
command line in your case.
Source: BigQuery release notes
Upvotes: 15
Reputation: 3653
You can use REST API call to get BigQuery table schema as JSON. Documentation link: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/get
curl 'https://bigquery.googleapis.com/bigquery/v2/projects/project-name/datasets/dataset-name/tables/table-name' \
--header 'Authorization: Bearer [YOUR_ACCESS_TOKEN]' \
--header 'Accept: application/json' \
--compressed
Upvotes: 4
Reputation: 172944
a way to dump schema from an existing table to a JSON file (preferably from the command-line). Is that possible?
try below
bq show bigquery-public-data:samples.wikipedia
You can use –format flag to prettify output
--format: none|json|prettyjson|csv|sparse|pretty:
Format for command output. Options include:
none: ...
pretty: formatted table output
sparse: simpler table output
prettyjson: easy-to-read JSON format
json: maximally compact JSON
csv: csv format with header
The first three are intended to be human-readable, and the latter three are for passing to another program. If no format is selected, one will be chosen based on the command run.
Realized I provided partial answer :o)
Below does what PO wanted
bq show --format=prettyjson bigquery-public-data:samples.wikipedia | jq '.schema.fields'
Upvotes: 187