tommyhmt
tommyhmt

Reputation: 327

break DAG lineage in DLT

I have an iterative transformation applied to a dataframe, it used to take a long time and having done lots of research online, it appears the issue was due to the DAG from growing exponentially. To fix this I cme across a solution which was to break the lineage by converting the dataframe to RDD and then back to pyspark again after each transformation in a loop. This works wonders when applied to a normal table, but now I'm using DLT and I'm getting this error:

Queries with streaming sources must be executed with writeStream.start();

Is there any way to resolve this?

Upvotes: 0

Views: 86

Answers (1)

The ERROR indicates that the foreachBatch operation is not recognized by Databricks Live Tables (DLT).

Queries with streaming sources must be executed with writeStream.start();

The foreachBatch operation is not supported in Databricks Live Tables (DLT) for streaming queries.

To work around this limitation, you can take the following approach:

Instead of writing directly to the target table (DLT )within the foreachBatch operation, write the intermediate results to a temporary table. After processing each micro-batch, store the results in this temporary table. Finally, use a separate job or process to periodically merge the data from the temporary table into your target table.

Reference: DLT fails with Queries with streaming sources must be executed with writeStream.start();

Upvotes: 0

Related Questions