Reputation: 167
Can we change the output dataset path dynamically in the my_compute_function as show below
from transforms.api import transform, Input, Output
@transform(
my_output=Output("/path/to/my/dataset"),
my_input=Input("/path/to/input"),
)
def my_compute_function(my_output, my_input):
**my_output.path = "new path"**
my_output.write_dataframe(
my_input.dataframe()
)
Upvotes: 0
Views: 823
Reputation: 967
No, this is not possible. The reason is that the inputs/outputs/transforms are fixed at "CI-time" or "build-time". When you press "commit" in Authoring or you merge a PR, a CI job is kicked off.
In this CI job, all the relations between inputs and outputs are determined. Output datasets that don't exist yet are created, and a "jobspec" is added to them. A "jobspec" is a snippet of JSON that describes to foundry how a particular dataset is generated.
Anytime you press the "build" button on a dataset (or build the dataset through a schedule or similar), the jobspec is consulted. It contains a reference to the repository, revision, source file and entry point of the function that builds this dataset. From there the build is orchestrated and kicks off, invoking your function to produce the final output.
This mechanism allows you to get a "static view" of the entire pipeline, which you can then visualize with Monocle, as you might have seen.
Depending on what your needs are, here are some solutions you might be able to use instead:
The main drawback with the latter approach is that it's not very dynamic, so if a new category shows up, you'll manually have to change the code to "triage" it into a new dataset, until the data becomes available.
There's other solutions (ultimately it is possible to make API calls and to manually adjust inputs/outputs as well, for instance) but they are more complex and undesirable from a maintenance perspective.
Upvotes: 0