Olsgaard
Olsgaard

Reputation: 1582

Insert data into / creating a sharded/wildcard table in BigQuery

There is a lot of documentation on gcp about querying sharded/wildcard tables [1][2], but I can't seem to figure out how to create or insert data into such as table.

Here's a trivial and mostly nonsensical example:

-- STANDARD SQL

CREATE TABLE IF NOT EXISTS  `mydataset.mytable_CA` AS
SELECT "CA" as COUNTRY_CODE

CREATE TABLE IF NOT EXISTS  `mydataset.mytable_US` AS
SELECT "US" as COUNTRY_CODE

SELECT * FROM `mydataset.mytable_*` WHERE _TABLE_SUFFIX = "CA"

Wildcards in CREATE-staments

How do I use wildcards in the create statement?

-- STANDARD SQL
-- DOES NOT WORK
CREATE TABLE IF NOT EXISTS  `mydataset.mytable_*` AS
SELECT "US" as COUNTRY_CODE WHERE _TABLE_SUFFIX = "US"

Wildcards in INSERT-statements

Similarly, how do I do inserts?

Is it possible to do something along the lines of the following:

-- STANDARD SQL
-- DOES NOT WORK
INSERT INTO `mydataset.mytable_*` WHERE _TABLE_SUFFIX = COUNTRY_CODE
SELECT * FROM UNNEST(["US", "US", "CA"]) as COUNTRY_CODE

I know it is recommended to use partitioning and clustering, but I am asking specifically about wildcards, so please indulge me :)

Is this possible using standard SQL? If not, how about legacy SQL?

Upvotes: 0

Views: 3165

Answers (1)

rmesteves
rmesteves

Reputation: 4085

As you can see here, DML has some limitations in BigQuery and one of them is:

Queries that contain Data Manipulation Language (DML) statements cannot use a wildcard table as the target of the query. For example, a wildcard table may be used in the FROM clause of an UPDATE query, but a wildcard table cannot be used as the target of the UPDATE operation.

In other words, you can not use a wildcard to specify a table to be created, updated or to have new records inserted. Despite that, you can use it to specify the source of your date.

Let's take a look in some examples of how to use wildcards with DML and DDL.

INSERTING

If you're inserting data into a wildcard table, you need to insert your data in the final table. In other words, you need to specify the coplete path including the suffix.

INSERT INTO `mydataset.mytable_US` <your query or your values>

If you're using the wildcard table as the source of data for inserting data in a normal table, you can use wildcards in the WHERE clause:

INSERT INTO `mydataset.mynormaltable` 
SELECT <fields> 
FROM `mydataset.mytable_*` 
WHERE _TABLE_SUFFIX = "US"


CREATING

Again, you can not use wildcard in the target, but you can use it to specify a source of data:

CREATE TABLE IF NOT EXISTS  `mydataset.normal_table` 
AS SELECT <fields> 
FROM `mydataset.mytable_*` 
WHERE _TABLE_SUFFIX = "US"

If you want to create a new wildcard table with the same prefix, you can do:

CREATE TABLE IF NOT EXISTS  `mydataset.mytable_GB` 
AS SELECT <fields> 
FROM `mydataset.table`

In this example you created a new suffix named GB. After this, you will be able to query your table with the GB suffix normaly.

Finally, I would like to attach here a reference to DML and DDL for BigQuery.

I hope it helps

Upvotes: 4

Related Questions