Reputation: 971
This is my first Talend ETL job. I managed to create a job. But it takes all the data from first Database and inserts into the second database.
But, when the job runs next time, it should only insert the new records from the first DB, it should update if it is any update on the first DB and delete the record if it is deleted from the first DB.
Is there any component exists in Talend to achieve this? Please help.
Upvotes: 1
Views: 1116
Reputation: 1627
Full compare CDC - (change data capture) is essentially what you are describing .
Depending on the volume of data and frequency of synchronization, full compare CDC may not be practical.
If you don't use database side CDC or commercial Talend CDC components, you can implement your own compare algorithm in Talend open studio.
Determine what the unique field (or fields) serve as the common key in the source and target table .
For each job run, select all data in the source table and all data in the target table .
For unique source keys that don't appear in the Target table; do an insert.
For unique source keys which do appear in he target, compare data further to determine if an update to the target is appropriate.
For unique keys that appear in the target table but not in the source, delete the target row. If you have other tables in the target system which reference data about to be deleted, evaluate what the right course of action would be . That depends largely on how the target System is being used and the purpose of it.
Note - sometimes people choose to load the source data into a staging table in the target system so comparisons can be performed in SQL.
Upvotes: 1