Reputation: 158
Foundry has a concept of changelog datasets, which I am using in order to speed up my ontology syncs. However, I've been told to always build datasets from the 'snapshot' version of the dataset, rather than the 'changelog' version. Why is this?
Upvotes: 1
Views: 400
Reputation: 158
In summary: Changelog datasets (by design) include previous versions of the same row. Unless your transform is designed to handle this, your transform will behave as if there were incorrect or duplicated input data.
Each time a Changelog dataset is built, any changes to the input data are appended to the changelog dataset as new rows. This is done because Foundry's Object Storage then can just apply the diff against the currently synced data, minimising the amount of data that needs to be synced.
This means that the changelog dataset is designed to contain multiple entries for each single row in the input dataset---every time an input row changes, the changelog dataset will have appended another entry containing a new version of that row.
Unless your transform is expecting this:
As a result, unless your transform is designed to handle the format of changelog datasets, it's best to build off the 'snapshot' version of datasets.
Upvotes: 0