Reputation: 7411
I have these 2 Scala sequences, I need to check whether they are equal or not, ignoring nullable column.
val schemaA = StructType(Seq(StructField("date",DateType,true), StructField("account_name",StringType,true)))
val df_A = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schemaA)
val schemaB = StructType(Seq(StructField("date",DateType,false), StructField("account_name",StringType,true)))
val df_B = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schemaB)
In python, I could have simply done this:
print(
all(
for a,b in zip(df_A.schema, df_B.schema)
(a.name, a.dataType) == (b.name, b.dataType)
)
)
But I got stuck to do the same thing in Scala
, any tips?
Upvotes: 0
Views: 143
Reputation: 40500
Another way to go around the "extra columns" problem mentioned in the comments:
val result = schemaA.map { a => a.name -> a.type } == schemaB.map { b => b.name -> b.type }
Upvotes: 2
Reputation: 37822
Quite similarly to your Python solution:
val result: Boolean = schemaA.zip(schemaB).forall {
case (a, b) => (a.name, a.dataType) == (b.name, b.dataType)
}
(no need to use the DFs).
Do note that both this solution and the python one might return true
when one of the schemas has extra fields that the otherone doesn't, because zip
would simply ignore them.
Upvotes: 2