Andi Maier
Andi Maier

Reputation: 1034

Azure Databricks INFORMATION_Schema

I am using Azure Databricks and need to have a way to find out which columns are allowed to be NULL in several tables. For MySQL there is the well-known Information_Schema which does not exist in Databricks.

My idea was now to use the Spark SQL to create a schema from there. I am now wondering if this is an equivalent way to generate the Information Schema? My approach looks like this:

df = spark.sql("Select * from mytable")
df.schema

Any comment would be much appreciated!

Upvotes: 1

Views: 1737

Answers (1)

Alex Ott
Alex Ott

Reputation: 87069

By default, in Spark any column of the Dataframe could be null. If you need to enforce that some data should be not null, then you either use the code to check before writing the data, or you can use constraints supported by Delta tables, like NOT NULL, or CHECK (for arbitrary conditions). With these constraints, Spark will check data before writing, and will fail if data doesn't match the given constraint, like this:

enter image description here

enter image description here

enter image description here

P.S> You can get more information about table's schema & these constraints if you use SQL commands like describe table or describe table extended.

Upvotes: 2

Related Questions