Feczo
Feczo

Reputation: 598

Combine hundreds of bigquery tables into one

I have hundreds of BigQuery tables and want to combine them all into one. I've started with trying to list what tables I have. I have tried bq ls, but it only shows maybe 20 - 50 tables. Also how do I do something similar to SELECT * INTO ... in Bigquery.

I have many tables for my application logs, like:

Upvotes: 3

Views: 1657

Answers (2)

DoomGoober
DoomGoober

Reputation: 1693

Run this command in the Google Cloud Console Command Line:

bq cp --append_table 1234:dataSetSource.tableSource 5678:dataSetDest.tableDest

Where 1234 is the project number of the source project from the dashboard and 5678 is the project number of the destination. Replace the dataset and table names with your dataset and table names.

For hundreds of tables, you can create a script to run all the commands.

Upvotes: 0

Feczo
Feczo

Reputation: 598

To achieve SELECT * INTO ... you can do SELECT * FROM all tables with comma and specify a destination table as per https://cloud.google.com/bigquery/bq-command-line-tool#createtablequery

for example:

DATASET=[YOUR_DATASET_NAME]
TABLES=$(bq ls --max_results=1000 --format=csv $DATASET | \
  grep -v "tableId,Type" | cut -d "," -f 1 | tr "\n" ",")
bq --dataset_id=$DATASET query  --destination_table=$DATASET.merged \
  "select * from $TABLES"

Note: If the tables have common fields, but they do not completely line up you you have to list the common fields instead of *

If you have a particular pattern of table names you can also utilize wildcards in your query https://cloud.google.com/bigquery/query-reference#tablewildcardfunctions

for example to match the 201506* tables from the example weblog dataset

DATASET=weblog
PREFIX=201506
bq query --destination_table=$DATASET.merged query "SELECT * FROM 
  (TABLE_QUERY($DATASET, 'REGEXP_MATCH(table_id, r\"^"$PREFIX"[\d]{4}\")'))"

Upvotes: 5

Related Questions