Ivan Konovalov
Ivan Konovalov

Reputation: 79

Is there a way to overwrite a Kedro dataset query in code?

I want to be able to overwrite the WHERE clause in Kedro dataset queries. Let's say I have the following catalogue entry:

some_table.raw:
    type: pandas.GBQQueryDataset
    sql: SELECT * FROM database.table WHERE date >= {start_date}

Then, in the code, I want to overwrite it with something like:

catalog.load("some_table.raw", query={"start_date": "2024-01-01"})

I know it's impossible since the load method supports no arguments except for the dataset name. But perhaps there are some workarounds I don't know about.

I would really appreciate your help here!

There's a suggestion to use TemplatedConfigLoader, but that will only allow me to load a dataset with one value of start_date. But I want to be able to pass different values in different parts of code.

Upvotes: 0

Views: 181

Answers (1)

Jitendra
Jitendra

Reputation: 1

Using runtime_params

You can dynamically overwrite the WHERE clause in Kedro dataset queries using the runtime_params feature. This allows you to inject parameters at runtime when creating a Kedro session. Here's how you can do it:

Step 1: Define the Catalog Entry In your catalog.yml, use runtime_params for the SQL query:

some_table.raw:
    type: pandas.GBQQueryDataset
    sql: SELECT * FROM database.table WHERE date >= ${runtime_params:start_date}

Step 2: Create a Kedro Session with Extra Parameters In your code, create a Kedro session and pass the parameters via extra_params:

from kedro.framework.session import KedroSession

extra_params = {"start_date": "2024-01-01"}

with KedroSession.create(extra_params=extra_params) as session:
    context = session.load_context()
    data = context.catalog.load("some_table.raw")

References For more detailed information, you can refer to the the discussion on GitHub Issue #2169 and Kedro documentation

Upvotes: 0

Related Questions