Reputation: 709
I would like to call an API to enrich an existing dataset.
The existing dataset is a CSVDataSet
configured in the catalog.
Now I would like to create a Node, that enriches the CSVDataSet
with data from the API, that I have to call for every row in the CSV file. Then save the data into a database (SQLTableDataSet
). My approach is to create an APIDataSet
entry in the catalog and provide it as an input for the node, next to the CSVDataSet
.
The issue here is, the APIDataSet
is static (in general the DataSets seem to be very static). I need to call the load function at runtime within the Node for every entry in the csv file.
I didn't find a way to do this. Is it just a bad approach? Do I have to call the API within the Node instead of creating a APIDataSet
?
Upvotes: 0
Views: 1328
Reputation: 26
I have done this in my GDALRasterDataSet
implementation. The idea is that if you need to enrich a dataset on the go, you can overload the load()
method in a custom dataset and pass additional parameters there.
You can see an implementation here and an example of usage here.
The only extra thing you need to do is to re-write the load()
method to accept kwargs
(line 143) and write your own _load
method that enriches your dataset. Everything else is boilerplate.
Upvotes: 1
Reputation: 1516
So typically, we don't like our nodes having knowledge of IO configuration. The belief is that functionally pure python functions are easier to test, maintain and build.
Typically the way we would keep this distinction would be for you to subclass our APIDataSet / CSVDataSet or both and then add your custom logic to do it all there.
Upvotes: 2