ndueck
ndueck

Reputation: 709

dynamic parameters on datasets in Kedro

I would like to call an API to enrich an existing dataset.

The existing dataset is a CSVDataSet configured in the catalog.
Now I would like to create a Node, that enriches the CSVDataSet with data from the API, that I have to call for every row in the CSV file. Then save the data into a database (SQLTableDataSet). My approach is to create an APIDataSet entry in the catalog and provide it as an input for the node, next to the CSVDataSet.
The issue here is, the APIDataSet is static (in general the DataSets seem to be very static). I need to call the load function at runtime within the Node for every entry in the csv file.

I didn't find a way to do this. Is it just a bad approach? Do I have to call the API within the Node instead of creating a APIDataSet?

Upvotes: 0

Views: 1328

Answers (2)

Barros
Barros

Reputation: 26

I have done this in my GDALRasterDataSet implementation. The idea is that if you need to enrich a dataset on the go, you can overload the load() method in a custom dataset and pass additional parameters there.

You can see an implementation here and an example of usage here.

The only extra thing you need to do is to re-write the load() method to accept kwargs (line 143) and write your own _load method that enriches your dataset. Everything else is boilerplate.

Upvotes: 1

datajoely
datajoely

Reputation: 1516

So typically, we don't like our nodes having knowledge of IO configuration. The belief is that functionally pure python functions are easier to test, maintain and build.

Typically the way we would keep this distinction would be for you to subclass our APIDataSet / CSVDataSet or both and then add your custom logic to do it all there.

Upvotes: 2

Related Questions