How to compare two schema in Databricks notebook in python

Question

I'm going to ingest data using databricks notebook. I want to validate the schema of the data ingested against what I'm expecting the schema of these data to be.

So basically I have:

    validation_schema = StructType([
      StructField("a", StringType(), True),
      StructField("b", IntegerType(), False),
      StructField("c", StringType(), False),
      StructField("d", StringType(), False)
    ])

    data_ingested_good = [("foo",1,"blabla","36636"),
     ("foo",2,"booboo","40288"),
     ("bar",3,"fafa","42114"),
     ("bar",4,"jojo","39192"),
     ("baz",5,"jiji","32432")
    ]

    data_ingested_bad = [("foo","1","blabla","36636"),
     ("foo","2","booboo","40288"),
     ("bar","3","fafa","42114"),
     ("bar","4","jojo","39192"),
     ("baz","5","jiji","32432")
    ]
     
    data_ingested_good.printSchema()
    data_ingested_bad.printSchema()
    validation_schema.printSchema()

I've seen similar questions but answers are always in scala.

Alex Ott · Accepted Answer

it's really depends on your exact requirements & complexities of schemas that you want to compare - for example, ignore nullability flag vs. taking it into account, order of columns, support for maps/structs/arrays, etc. Also, do you want to see difference or just a flag if schemas are matching or not.

In the simplest case it could be as simple as following - just compare string representations of schemas:

def compare_schemas(df1, df2):
  return df1.schema.simpleString() == df2.schema.simpleString()

I personally would recommend to take an existing library, like Chispa that has more advanced schema comparison functions - you can tune checks, it will show differences, etc. After installation (you can just do %pip install chispa) - this will throw an exception if schemas are different:

from chispa.schema_comparer import assert_schema_equality

assert_schema_equality(df1.schema, df2.schema)

How to compare two schema in Databricks notebook in python

Answers (2)

Related Questions