Reputation: 66
Ideally historical data is loaded first and then current data but we have current data already loaded into snowflake from Kafka as Upsert outputs. We have to later ingest historical data as well and that will be loaded from diff source, lets says from S3 dumps. Can we accomplish this?
Upvotes: 0
Views: 59
Reputation: 66
You definitely need to create new data source for the historical data. This way historical data is ingested into Upsolver.
Design considerations for next step: If the output was append only (just keep inserting), then your Snowflake output could have used both the historical data source and current data source UNION. You can either add multiple data sources while creating output from UI or you can edit the SQL to add UNION for both data sources and both historical and current data will land into target table. This design would handle Upsert use case also if historical data came first and was fully ingested before current data source started receiving data.
However, in this specific ask, since historical data is arriving later, we can't use this approach as historical data that is arriving in future could Upsert and override current latest data.
Solution 1: if you can stop the current data source till the time historical data is completely processed.
Solution 2: If the current data volume is small and can be reprocessed from beginning then
Solution 3:
Hope this helps.
Upvotes: 0