NutellaKing
NutellaKing

Reputation: 113

How do I create a BigQuery Scheduled Query that calls a script/stored procedure with Python?

I am trying to modify a python script that i inherited (I am new to Python) that creates BigQuery Scheduled Queries that are located in a json file. This script works fine until I needed to add a stored procedure to it.

The Stored Procedure doesn't have a destination table, destination dataset, or a write preference. However, if I leave those null, blank or don't pass them I get errors telling me I need to provide a destination table and dataset. If I pass them then BigQuery tells me I can't have a destination table for a script. I've found a lot of documentation on simply setting this up but nothing on calling scripts.

Below are examples of what i have so far.

json

{
    "query_file": "./scheduled/gaps.sql",
    "schedule": "Every 15 Minutes",
    "dest_dataset": "dataset",
    "dest_table": "gaps"
},
{
        "query_file": "./scheduled/stored_procedure.sql",
        "schedule": "Every 15 Minutes",
        "dest_dataset": "",
        "dest_table": ""
    }

where this is called, which isp retty much the same as in the documentation

transfer_config = bigquery_datatransfer.TransferConfig(

    destination_dataset_id=schedule.dest_dataset,
    display_name=schedule.formatted_display_name(),
    data_source_id="scheduled_query",
    params={
        "query": query.replace("${project}.", f"{self._project_id}."),
        "destination_table_name_template": schedule.dest_table,
         "write_disposition": "WRITE_TRUNCATE",
        "partitioning_field": ""
    },
    schedule=schedule.schedule
)

result = self._transfer_client.create_transfer_config(
    bigquery_datatransfer.CreateTransferConfigRequest(
        parent=self._parent,
        transfer_config=transfer_config,
        service_account_name=self._service_account_name
    ),
)

Upvotes: 0

Views: 2156

Answers (1)

guillaume blaquiere
guillaume blaquiere

Reputation: 75735

Scheduled query is designed to run... queries. Stored procedure isn't a query, it's a procedure, that you can invoke from a query.

That introduction to tell you that you have 2 types of object: Query and stored procedure. You have to make an evolution in your JSON schema and add the type of object (query, function, procedure, view,...) and your code need to perform different BigQuery API calls accordingly.

But, be careful, there is a trap in that design because some object can have dependencies with others. For instance, you create a stored procedure and a query that use the stored procedure. The stored procedure must be created before the scheduled query, else it will fail.

So, the problem that you have to solve is not simple. That's why, for those kind of dependencies and configuration, IaC (infrastructure as code) tools like Terraform are very powerful to manage the dependency (explicitly or implicitly).

Upvotes: 1

Related Questions