Reputation: 1034
I am using Azure Databricks and need to have a way to find out which columns are allowed to be NULL in several tables. For MySQL there is the well-known Information_Schema which does not exist in Databricks.
My idea was now to use the Spark SQL to create a schema from there. I am now wondering if this is an equivalent way to generate the Information Schema? My approach looks like this:
df = spark.sql("Select * from mytable")
df.schema
Any comment would be much appreciated!
Upvotes: 1
Views: 1737
Reputation: 87069
By default, in Spark any column of the Dataframe could be null. If you need to enforce that some data should be not null, then you either use the code to check before writing the data, or you can use constraints supported by Delta tables, like NOT NULL
, or CHECK
(for arbitrary conditions). With these constraints, Spark will check data before writing, and will fail if data doesn't match the given constraint, like this:
P.S> You can get more information about table's schema & these constraints if you use SQL commands like describe table
or describe table extended
.
Upvotes: 2