DR_S
DR_S

Reputation: 102

Taking full Input dataset when testing transformations in Palantir Foundry

In Palantir Foundry, I could see that we can write unit tests using Pytest or TransformRunner. My understanding is that, with Pytest we cannot pass an output of transform for unit testing and in TransformRunner we cannot use the dataset that we have to use originally. We need some test data. But I would like to use the whole input dataset on which my transform should run in production and do run tests on the output of it. How can I achieve that?

Upvotes: 1

Views: 977

Answers (1)

fmsf
fmsf

Reputation: 37177

You can't access foundry datasets from the CI, you'll need to have the data snippet in a file within your repo and then load it.

test/fixtures/data/input/a.csv

col_a,col_b
1,2
TEST_DATA_DIR = os.path.join(os.path.dirname(__file__), '..', '..', 'fixtures', 'data')


def test_runner_single_table(spark_session):
    pipeline = Pipeline()

    @transform_df(Output('/test_single_table/output/test'),
                  input_a=Input('/test_single_table/input/a'))
    def transform_1(input_a):
        return input_a.withColumn('col_c', input_a['col_a'] + input_a['col_b'])

    pipeline.add_transforms(transform_1)

    runner = TransformRunner(pipeline, '/test_single_table', TEST_DATA_DIR)

    output = runner.build_dataset(spark_session, '/test_single_table/output/test')
    assert output.first()['col_c'] == 3

TransformsRunner will translate the Input path into the directory path. In the example above:

  • TEST_DATA_DIR tells the runner where the data is in your environment
  • '/test_single_table' tells the runner what subpath can be ignored, since this path only exists on foundry datasets, not within your repo
  • input/a will be resolved against the Input('[ignored_sub_path]/input/a') and folder structure you defined in your repo.

You can print this properties and it will show up in the CI checks, if you want to understand them better.

Upvotes: 0

Related Questions