Reputation: 63
We want to write unit tests for python transforms that have multiple outputs (i.e. @transform
annotation) and have not been able to build the TransformOutput
objects we need to pass to the function we're testing.
What's the best way to do this?
Upvotes: 2
Views: 316
Reputation: 1030
You can create fake inputs and outputs as follows, then pass them into your @transform
function:
class FakeTransformInput:
def __init__(self, df):
self.df = df
def dataframe(self):
return self.df
def set_mode(self, mode):
pass
class FakeTransformOutput:
def __init__(self, df):
self.df = df
def dataframe(self):
return self.df
def write_dataframe(
self, df, partition_cols=None, bucket_cols=None, bucket_count=None,
sort_by=None, output_format=None, options=None, column_descriptions=None,
column_typeclasses=None):
self.df = df
def set_mode(self, mode):
pass
And to use them:
output_schema = StructType([
StructField("col_1", StringType(), True),
StructField("col_2", StringType(), True),
StructField("col_n", StringType(), True),
])
output_transform = FakeTransformOutput(spark_session.createDataFrame([], output_schema))
input_transform = FakeTransformInput(spark_session.createDataFrame(input_df))
YOUR_MODULE.compute(
input_transform, output_transform
)
# Perform assertions on output_transform
Upvotes: 3