Venkatesh
Venkatesh

Reputation: 107

Delta live tables data validation in databricks

I have recieved a requirement. The data is incrementally copied to Bronze layer live table. Once the data is in bronze layer need to apply the data quality checks and final data need to be loaded into silver live table. I don’t have idea on this.

Could anyone please help me how to write the code using PySpark in databricks

Upvotes: 1

Views: 984

Answers (2)

Alex Ott
Alex Ott

Reputation: 87279

You need to follow the DLT Python tutorial.

  • Declare a live table for your bronze layer using Auto Loader or other source type:
@dlt.table
def bronze():
  df = spark.readStream.format("cloudFiles")...load(input_path)
  • Declare the silver layer that will perform data transformation and enforce data quality checks using the expectations:
@dlt.table
@dlt.expect_or_drop("col1_not_null", "col1 is not null")
def silver():
  df = dlt.read_stream("bronze")

Upvotes: 1

Pallav Garg
Pallav Garg

Reputation: 46

you can refer the databricks documentation as the task seems to be basic.

For ingestion into bronze layer - Autoloader

For bronze layer to silver layer(applying constraints)-https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/expectations

Upvotes: 0

Related Questions