Reputation: 21
I am trying to select * rows from 3 tables that match a TABLE_SUFFIX pattern, the thing is I didn't recieve the expected output.
The query I am using:
SELECT
*
FROM
`project-id.airbyte_google_ads.client_id_*`
WHERE
REGEXP_CONTAINS(_TABLE_SUFFIX, r"_campaign_performance_overview$")
The ouput recieved contains columns of other tables, and not from the ones that I want, but if I am using:
SELECT
DISTINCT _TABLE_SUFFIX as tables
FROM
`project-id.airbyte_google_ads.client_id_*`
WHERE
REGEXP_CONTAINS(_TABLE_SUFFIX, r"_campaign_performance_overview$")
The tables names from which I want to select rows, are correct.
My tought is that something is wrong at wildcard line, and i tought if there can be a way to use it somehow like:
`project-id.airbyte_google_ads.client_id_*_campaign`
or something similar, because looks like the query does something at FROM statement, and does whats in WHERE at a different point.
Let me know what are your toughts on that. Thank you for your time!
Upvotes: 0
Views: 1797
Reputation: 1906
As per this documentation, when using wildcard tables, all the tables in the dataset that begin with the table name before * are scanned even if _TABLE_SUFFIX
is used in combination with REGEXP_CONTAINS
. In our case, the wildcard pattern is client_id_*
and hence, the values such as client_id_1_campaigns
are also matched irrespective of the pattern in REGEXP_CONTAINS
.
The reason for this behaviour is that, the wildcard pattern precedes the regex and scans all the tables matching the wildcard pattern and will not take the regex into account. Using wildcards while also using REGEXP_CONTAINS
is applying regex on top of regex and is not recommended.
If you wish to have the intended target tables you will need to use the below query instead of using wildcards to query multiple tables.
SELECT *
FROM (
SELECT * FROM `project-id.dataset-id.client_id_2_campaign_performance_overview` UNION ALL
SELECT * FROM `project-id.dataset-id.client_id_7_campaign_performance_overview` UNION ALL
SELECT * FROM `project-id.dataset-id.client_id_10_campaign_performance_overview`);
Using the LIKE
operator also does not give the expected results for the same reason mentioned above. The tables are scanned first then filtered giving extra columns in the result.
Also, BigQuery uses the schema for the most recently created table that matches the wildcard as the schema for the wildcard table. Even if you restrict the number of tables that you want to use from the wildcard table using the _TABLE_SUFFIX pseudo column in a WHERE clause, BigQuery uses the schema for the most recently created table that matches the wildcard. You will see the extra columns in the result if the most recently created table has them.
Upvotes: 1