Her0Her0
Her0Her0

Reputation: 547

How to tell BigQuery not to "check" my query when I send it? I keep getting table not found

Long story short, I am sending multiple batch queries after each other.

query_1 creates a table. Let's call it table_1.

query_2 creates a table, let's call it table_2, using table_1.

When I send the second batch query, because the first one has not been run, BigQuery instantly returns:

Error: Not found: Table table_1 was not found in location US.

However, as batch queries are queued, query_1 will inevitably run before query_2, so when query_2 is actually run, table_1 will exist.

I have two ideas:

  1. Tell BigQuery not to check for table validity when it received the query.
  2. Add a phony conditional to the from clause, i.e. in pseudocode: SELECT * FROM (table_1 if it exists else table_2). This would always evaluate to table_1, as, as mentioned above, query_1 would complete prior to query_2 being launched.

I have tried both, and failed, because:

  1. I could not find the correct option to send.
  2. I could not find an SQL way to do this.

Please note that it is not possible to send queries 1 and 2 at once. If it is relevant, the queries are sent via a cloud function on GCP. The function is written in node. The only options I use are location, query, and priority ("BATCH").

Thank you for any help you can provide!

Upvotes: 1

Views: 60

Answers (2)

Vibhor Gupta
Vibhor Gupta

Reputation: 699

You can try following code to check for availability of table_1, if its not exist then it should query table_2.

IF (
  SELECT COUNT(1) AS cnt
  FROM `proj.dataset`.__TABLES_SUMMARY__
  WHERE table_id = 'table_1'   
  ) = 1 
THEN
  SELECT * from `proj.dataset.table_1`;
ELSE
  SELECT * FROM `proj.dataset.table_2`;
END IF;

Upvotes: 0

SeungwooLee
SeungwooLee

Reputation: 969

Your second cloud function query has a dependancy to first one.

So it must be guaranteed that the first query is excuted when second one is processed.

I think there are 2 options in this case.

First is, Insert interval time to your cloud function code between first and second query.

I'm not sure of this scenario as you noted that it is not possible to send both queries at once.

Second is, Use Cloud Dataflow.

It is batch data processing service that can query bigQuery table.

This seems more suitable option than using cloud function jobs.

Dataflow also can process multiple query asynchronously, So your both queries are guaranteed to be in order.

If your cloud function is triggered by Cloud Pub/sub messages, Using Dataflow SQL is best practice for this scenario.

Upvotes: 1

Related Questions