cenumah
cenumah

Reputation: 63

How to test multi-output pyspark transforms in Foundry

We want to write unit tests for python transforms that have multiple outputs (i.e. @transform annotation) and have not been able to build the TransformOutput objects we need to pass to the function we're testing.
What's the best way to do this?

Upvotes: 2

Views: 316

Answers (1)

Rob
Rob

Reputation: 1030

You can create fake inputs and outputs as follows, then pass them into your @transform function:

class FakeTransformInput:
    def __init__(self, df):
        self.df = df

    def dataframe(self):
        return self.df

    def set_mode(self, mode):
        pass
class FakeTransformOutput:
    def __init__(self, df):
        self.df = df

    def dataframe(self):
        return self.df

    def write_dataframe(
        self, df, partition_cols=None, bucket_cols=None, bucket_count=None,
            sort_by=None, output_format=None, options=None, column_descriptions=None,
            column_typeclasses=None):
        self.df = df

    def set_mode(self, mode):
        pass

And to use them:

output_schema = StructType([
    StructField("col_1", StringType(), True),
    StructField("col_2", StringType(), True),
    StructField("col_n", StringType(), True),
])
output_transform = FakeTransformOutput(spark_session.createDataFrame([], output_schema))

input_transform = FakeTransformInput(spark_session.createDataFrame(input_df))

YOUR_MODULE.compute(
    input_transform, output_transform
)

# Perform assertions on output_transform

Upvotes: 3

Related Questions