Reputation: 86
I have a dataframe with a column I created with collect_set. its type is:
t.StructField("list_of_stuff", t.ArrayType(t.StringType(), False), True)
I want to create a test which will validate the dataframe by comparing it to another one I'm loading from a json file while using the same schema. Although all the rows in the file contains valid array values in this field, the loaded data frame gets a schema with the below type (other columns are the same):
t.StructField("list_of_stuff", t.ArrayType(t.StringType(), True), True)
So, when I try to compare it by using assert_frame_equal, I get an error that the column is not the same.
So 2 questions here:
Upvotes: 2
Views: 45
Reputation: 86
I managed to handle #2:
converter = udf(lambda x: x, t.ArrayType(t.StringType(), False))
df = df.withColumn("list_of_stuff", converter("list_of_stuff"))
Upvotes: 1