john
john

Reputation: 61

Combining Spark schema without duplicates using PySpark?

unable to combine the following three to a single final schema without duplicate columns.
Here is the code below:

schema1 = StructType([StructField("A", StringType(), True),
                        StructField("B", StringType(), True)])
schema2 = StructType([StructField("c", StringType(), True),
                        StructField("B", StringType(), True)])
schema3 = StructType([StructField("D", StringType(), True),
                        StructField("A", StringType(), True)])
final=(schema1 ++ schema2 ++ schema3).distinct
print( final)

Upvotes: 1

Views: 946

Answers (1)

mck
mck

Reputation: 42422

schema1 = StructType([StructField("A", StringType(), True),
                        StructField("B", StringType(), True)])
schema2 = StructType([StructField("c", StringType(), True),
                        StructField("B", StringType(), True)])
schema3 = StructType([StructField("D", StringType(), True),
                        StructField("A", StringType(), True)])
final = StructType(list(set(schema1.fields+schema2.fields+schema3.fields)))
print(final)

gives

StructType(List(StructField(B,StringType,true),StructField(D,StringType,true),StructField(c,StringType,true),StructField(A,StringType,true)))

Upvotes: 2

Related Questions