Am1rr3zA
Am1rr3zA

Reputation: 7411

Scala loop over 2 sequences at the same time

I have these 2 Scala sequences, I need to check whether they are equal or not, ignoring nullable column.

val schemaA = StructType(Seq(StructField("date",DateType,true), StructField("account_name",StringType,true)))

val df_A = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schemaA)

val schemaB = StructType(Seq(StructField("date",DateType,false), StructField("account_name",StringType,true)))

val df_B = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schemaB)

In python, I could have simply done this:

 print(
     all(        
         for a,b in zip(df_A.schema, df_B.schema)
           (a.name, a.dataType) == (b.name, b.dataType)
     )
 )

But I got stuck to do the same thing in Scala, any tips?

Upvotes: 0

Views: 143

Answers (2)

Dima
Dima

Reputation: 40500

Another way to go around the "extra columns" problem mentioned in the comments:

val result = schemaA.map { a => a.name -> a.type } == schemaB.map { b => b.name -> b.type }

Upvotes: 2

Tzach Zohar
Tzach Zohar

Reputation: 37822

Quite similarly to your Python solution:

val result: Boolean = schemaA.zip(schemaB).forall {
  case (a, b) => (a.name, a.dataType) == (b.name, b.dataType)
}

(no need to use the DFs).

Do note that both this solution and the python one might return true when one of the schemas has extra fields that the otherone doesn't, because zip would simply ignore them.

Upvotes: 2

Related Questions