Delta live tables data validation in databricks

I have recieved a requirement. The data is incrementally copied to Bronze layer live table. Once the data is in bronze layer need to apply the data quality checks and final data need to be loaded into silver live table. I don’t have idea on this.

Could anyone please help me how to write the code using PySpark in databricks

Upvotes: 1

Answers (2)

Alex Ott

Reputation: 87279

You need to follow the DLT Python tutorial.

Declare a live table for your bronze layer using Auto Loader or other source type:

@dlt.table
def bronze():
  df = spark.readStream.format("cloudFiles")...load(input_path)

Declare the silver layer that will perform data transformation and enforce data quality checks using the expectations:

@dlt.table
@dlt.expect_or_drop("col1_not_null", "col1 is not null")
def silver():
  df = dlt.read_stream("bronze")

Create & run DLT pipeline

Upvotes: 1

Pallav Garg

Reputation: 46

you can refer the databricks documentation as the task seems to be basic.

For ingestion into bronze layer - Autoloader

For bronze layer to silver layer(applying constraints)-https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/expectations

Upvotes: 0

Delta live tables data validation in databricks

Answers (2)

Related Questions