Databricks schema enforcement issues

Question

As suggested in the article about schema enforcement, a declared schema helps detecting issues early.

The two issues described below however are preventing me from creating a descriptive schema.

Comments on a table column are seen as a difference in the schema

# Get data
test_df = spark.createDataFrame([('100000146710')], ['code'])
# ... save
test_df.write.format("delta").mode("append").save('/my_table_location')

# Create table: ... BOOM
spark.sql("""
   CREATE TABLE IF NOT EXISTS my_table (
       code STRING COMMENT 'Unique identifier'
   ) USING DELTA LOCATION '/my_table_location'
""")

This will fail with AnalysisException: The specified schema does not match the existing schema at /my_table_location . The only solution I found is to drop the columnt comments.

Not null struct field shows as nullable

json_schema = StructType([
  StructField("code", StringType(), False)
])

json_df = (spark.read
  .schema(json_schema)
  .json('/my_input.json')
)
json_df.printSchema()

will show

root
  |-- code: string (nullable = true)

So despite the schema declaration stating that a field is not null, the field shows as nullable in the dataframe. Because of this, adding a NOT NULL constraint on the table column will trigger the AnalysisException error.

Any comments or suggestions are welcome.

Databricks schema enforcement issues

Answers (1)

Related Questions