arun bandhakavi
arun bandhakavi

Reputation: 31

Read data that is already in output and write back to the output

I have requirement to read the data that is already in the output and join the data to input and write back the data to the same output. This build is scheduled every day.

Input:

ID Refresh_Date
1 6/8/2022
2 6/8/2022
3 6/8/2022

Historical(Output):

ID Order Date Order Closure Age
1 6/6/2022 6/7/2022 1
2 6/7/2022
3 6/7/2022
4 6/7/2022

The input data will be refreshed with new orders every day, so I have join the input to the historical data and find the closure date and time it took to close the order. The result of the join should be saved as Historical again

I tried using incremental computation but the output in read mode is always giving me empty dataset.

Upvotes: 3

Views: 686

Answers (1)

3yakuya
3yakuya

Reputation: 2672

Your intuition to use @incremental decorator is correct.

It sounds like your problem is related to the mode in which you are accessing the current dataframe. Check out the documentation on incremental modes of inputs and outputs; in particular, the default mode is added while you'd probably want to use current or previous for your implementation, as these are the modes that give you access to the data currently within the dataset.

Also, the documentation on incremental decorator is overall very helpful to understand how to make incremental computation work for you. Have a look at different parameters you can pass to your decorator, in particular snapshot_inputs, as it may affect how you access the input dataset as well.

Upvotes: 1

Related Questions