IamDataEngineer
IamDataEngineer

Reputation: 45

Databricks Initial Load for Delta live table

I'm working on Databricks initial load for Delta live tables .

My scenario for production pipeline is I have target table which is SCD 1 and my source data is coming from eventhub (which stores data for 7 days only from upstream).

Any history records which are not as part of 7 days should be as initial load.

pipeline :

source system table -----> eventhub ------> landing cdc table> -------> target silver table

initial load :

source system table ---> ADF ---> ADLS(parque file) --> target silver table

When i'm trying to do a initial load i'm getting following error.

I tried the load with different delta live table pipeline (no 2 pipelines should not have same table) and also tried to change notebooks to the existing pipeline, Still i have same problem .

How to resolve this isssue ?

error :

org.apache.spark.sql.catalyst.ExtendedAnalysisException: Table '<Initial_load_table>' is already managed by pipeline . A table can only be owned by one pipeline. Concurrent pipeline operations such as maintenance and full refresh will conflict with each other.

Upvotes: 0

Views: 898

Answers (1)

Anupam Chand
Anupam Chand

Reputation: 2687

DLT expects all the tables to be defined in the same pipeline. It doesn't work so well with tables which are already defined. The reason is that it has a lot of optimizations, vacuuming which it expects to take responsibility for.

Are either of the pipelines already in production? If you are still not deployed yet, why not consider putting both in the same pipeline? You can still have them as separate notebooks but both need to be on the dlt framework and both need to be executed in the same pipeline. Both will run in a streaming fashion so the batch mode will pick up the data when its available(you need to use autoloader for this). You will need to do a union between the 2 inputs to give you you silver table.

When all historical data is loaded, you can choose to either leave the pipeline as it is or do a deployment removing the historical notebook.

Upvotes: 0

Related Questions