Reputation: 102
In Palantir Foundry, I could see that we can write unit tests using Pytest or TransformRunner. My understanding is that, with Pytest we cannot pass an output of transform for unit testing and in TransformRunner we cannot use the dataset that we have to use originally. We need some test data. But I would like to use the whole input dataset on which my transform should run in production and do run tests on the output of it. How can I achieve that?
Upvotes: 1
Views: 977
Reputation: 37177
You can't access foundry datasets from the CI, you'll need to have the data snippet in a file within your repo and then load it.
test/fixtures/data/input/a.csv
col_a,col_b
1,2
TEST_DATA_DIR = os.path.join(os.path.dirname(__file__), '..', '..', 'fixtures', 'data')
def test_runner_single_table(spark_session):
pipeline = Pipeline()
@transform_df(Output('/test_single_table/output/test'),
input_a=Input('/test_single_table/input/a'))
def transform_1(input_a):
return input_a.withColumn('col_c', input_a['col_a'] + input_a['col_b'])
pipeline.add_transforms(transform_1)
runner = TransformRunner(pipeline, '/test_single_table', TEST_DATA_DIR)
output = runner.build_dataset(spark_session, '/test_single_table/output/test')
assert output.first()['col_c'] == 3
TransformsRunner
will translate the Input
path into the directory path. In the example above:
TEST_DATA_DIR
tells the runner where the data is in your environment'/test_single_table'
tells the runner what subpath can be ignored, since this path only exists on foundry datasets, not within your repoinput/a
will be resolved against the Input('[ignored_sub_path]/input/a')
and folder structure you defined in your repo.You can print this properties and it will show up in the CI checks, if you want to understand them better.
Upvotes: 0