Reputation: 79
I want to be able to overwrite the WHERE
clause in Kedro dataset queries. Let's say I have the following catalogue entry:
some_table.raw:
type: pandas.GBQQueryDataset
sql: SELECT * FROM database.table WHERE date >= {start_date}
Then, in the code, I want to overwrite it with something like:
catalog.load("some_table.raw", query={"start_date": "2024-01-01"})
I know it's impossible since the load
method supports no arguments except for the dataset name. But perhaps there are some workarounds I don't know about.
I would really appreciate your help here!
There's a suggestion to use TemplatedConfigLoader
, but that will only allow me to load a dataset with one value of start_date
. But I want to be able to pass different values in different parts of code.
Upvotes: 0
Views: 181
Reputation: 1
Using runtime_params
You can dynamically overwrite the WHERE clause in Kedro dataset queries using the runtime_params
feature. This allows you to inject parameters at runtime when creating a Kedro session. Here's how you can do it:
Step 1: Define the Catalog Entry
In your catalog.yml
, use runtime_params
for the SQL query:
some_table.raw:
type: pandas.GBQQueryDataset
sql: SELECT * FROM database.table WHERE date >= ${runtime_params:start_date}
Step 2: Create a Kedro Session with Extra Parameters
In your code, create a Kedro session and pass the parameters via extra_params
:
from kedro.framework.session import KedroSession
extra_params = {"start_date": "2024-01-01"}
with KedroSession.create(extra_params=extra_params) as session:
context = session.load_context()
data = context.catalog.load("some_table.raw")
References For more detailed information, you can refer to the the discussion on GitHub Issue #2169 and Kedro documentation
Upvotes: 0