Reputation: 31
I have requirement to read the data that is already in the output and join the data to input and write back the data to the same output. This build is scheduled every day.
Input:
ID | Refresh_Date |
---|---|
1 | 6/8/2022 |
2 | 6/8/2022 |
3 | 6/8/2022 |
Historical(Output):
ID | Order Date | Order Closure | Age |
---|---|---|---|
1 | 6/6/2022 | 6/7/2022 | 1 |
2 | 6/7/2022 | ||
3 | 6/7/2022 | ||
4 | 6/7/2022 |
The input data will be refreshed with new orders every day, so I have join the input to the historical data and find the closure date and time it took to close the order. The result of the join should be saved as Historical again
I tried using incremental computation but the output in read mode is always giving me empty dataset.
Upvotes: 3
Views: 686
Reputation: 2672
Your intuition to use @incremental
decorator is correct.
It sounds like your problem is related to the mode in which you are accessing the current dataframe. Check out the documentation on incremental modes of inputs and outputs; in particular, the default mode is added
while you'd probably want to use current
or previous
for your implementation, as these are the modes that give you access to the data currently within the dataset.
Also, the documentation on incremental decorator is overall very helpful to understand how to make incremental computation work for you. Have a look at different parameters you can pass to your decorator, in particular snapshot_inputs
, as it may affect how you access the input dataset as well.
Upvotes: 1