N.N.
N.N.

Reputation: 3172

Error: TABLE_QUERY expressions cannot query BigQuery tables

This s a followup question regarding Jordans answer here: Weird error in BigQuery

I was using to query reference table within "Table_Query" for quit some time. Now, following the recent changes Joradan is referring to, many of our queries are broken... I would like to ask the community advice for alternative solution to what we are doing.

I have tables containing events ("MyTable_YYYYMMDD"). I want to query my data for a period of a specific (or several) campaign. The period of that campaign is stored in a table with all campaigns data (ID, StartCampaignDate, EndCampaignDate). In order to query only the relevant tables, we use Table_Query(), and within the TableQuery() we construct a list of all relevant table names based on the campaigns data. This query runs in various forms many times with different params. the reason for using wildcard function (rather than query the entire dataset), is performance, execution costs, and maintenance costs. So, having it query all tables and filter just the results is not an option as it drives execution costs too high.

a sample query will look like:

SELECT
  *
FROM
  TABLE_QUERY([MyProject:MyDataSet] 'table_id IN  
  (SELECT CONCAT("MyTable_",STRING(Year*100+Month)) TBL_NAME  
    FROM DWH.Dim_Periods P  
    CROSS JOIN DWH.Campaigns AS LC  
    WHERE ID IN ("86254e5a-b856-3b5a-85e1-0f5ab3ff20d6") 
    AND DATE(P.Date) BETWEEN DATE(StartCampaignDate) AND DATE(EndCampaignDate))')

This is now broken... My question - the info, which tables should you query is stored on a reference table, How would you query only the relevant tables (partitions) when "TableQuery" is no longer allowed to query reference tables?

Many thanks

Upvotes: 4

Views: 386

Answers (2)

N.N.
N.N.

Reputation: 3172

This is not ideal solution. But it seems to do the job.

In my previous query I passed the IDs List as a parameter in an external process that constructed the query. I wanted this process to be unaware to any logic implemented in the query.

Eventually we came up with this solution:

Instead of passing a list of IDs, we pass a JSON that contains the relevant meta data for each ID. We parse this JSON within the Table_Query() function. So instead of querying a physical reference table, we query some sort of a "table variable" that we have put in a JSON.
Below is a sample query that runs on the public dataset that demonstrates this solution.

    SELECT
  YEAR,
  COUNT (*) CNT
FROM
  TABLE_QUERY([fh-bigquery:weather_gsod], 'table_id in
(Select table_id
From
(Select table_id,concat(Right(table_id,4),"0101") as TBL_Date from [fh-bigquery:weather_gsod.__TABLES_SUMMARY__]
where table_id Contains "gsod"
)TBLs
CROSS JOIN 
(select 
Regexp_Replace(Regexp_extract(SPLIT(DatesInput,"},{"),r"\"fromDate\":\"(\d\d\d\d-\d\d-\d\d)\""),"-","") as fromDate,
Regexp_Replace(Regexp_extract(SPLIT(DatesInput,"},{"),r"\"toDate\":\"(\d\d\d\d-\d\d-\d\d)\""),"-","") as toDate,
FROM
(Select
"[  
      {  
         \"CycleID\":\"123456\",
         \"fromDate\":\"1929-01-01\",
         \"toDate\":\"1950-01-10\"
      },{  
         \"CycleID\":\"123456\",
         \"fromDate\":\"1970-02-01\",
         \"toDate\":\"2000-02-10\"
      }
   ]"
   as DatesInput)) RefDates
   WHERE TBLs.TBL_Date>=RefDates.fromDate
   AND TBLs.TBL_Date<=RefDates.toDate
)')
GROUP BY
  YEAR
ORDER BY
  YEAR

This solution is not ideal as it requires an external process to be aware of the data stored in the reference tables. Ideally the BigQuery team will re-enable this very useful functionality.

Upvotes: 0

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

The "simple" way I see is split it to two steps
Step 1 - build list that will be used to filter table_id's

SELECT GROUP_CONCAT_UNQUOTED(
                   CONCAT('"',"MyTable_",STRING(Year*100+Month),'"')
       ) TBL_NAME_LIST 
FROM DWH.Dim_Periods P  
CROSS JOIN DWH.Campaigns AS LC  
WHERE ID IN ("86254e5a-b856-3b5a-85e1-0f5ab3ff20d6") 
AND DATE(P.Date) BETWEEN DATE(StartCampaignDate) AND DATE(EndCampaignDate)

Note the change in your query to transform result to list that you will use in step 2

Step 2 - final query

SELECT
  *
FROM
  TABLE_QUERY([MyProject:MyDataSet], 
              'table_id IN (<paste list (TBL_NAME_LIST) built in first query>)')

Above steps are easy to implement in any client you potentially using
If you use it from within BigQuery Web UI - this makes you do a little extra manual "moves" that you might not be happy about

My answer is obvious and you most likely have this already as an option, but wanted to mention

Upvotes: 2

Related Questions