Reputation: 1747
I want to read the column descriptions and typeclasses from my upstream datasets, then I want to simply pass them through to my downstream datasets.
How can I do this in Python Transforms?
Upvotes: 1
Views: 495
Reputation: 1747
If you upgrade your repository to at least 1.206.0, you'll be able to access a new feature inside the Transforms Python API: read and write of descriptions and typeclasses. For visibility, this question is also highly related to this one
The column_descriptions
property gives back a structured Dict<str, List<Dict<str, str>>>
, for example a column of tags
will have a column_typeclasses object of {'tags': [{"name": "my_name", "kind": "my_kind"}]}
. A typeclass always consists of two components, a name
, and a kind
, which is present in every dictionary of the list shown above. It is the only two keys possible to pass in this dict, and the corresponding values for each key must be str.
Full documentation is in the works for this feature, so stay tuned.
from transforms.api import transform, Input, Output
@transform(
my_output=Output("ri.foundry.main.dataset.my-output-dataset"),
my_input=Input("ri.foundry.main.dataset.my-input-dataset"),
)
def my_compute_function(my_input, my_output):
recent = my_input.dataframe().limit(10)
existing_typeclasses = my_input.column_typeclasses
existing_descriptions = my_input.column_descriptions
my_output.write_dataframe(
recent,
column_descriptions=existing_descriptions,
column_typeclasses=existing_typeclasses
)
Upvotes: 2