Reputation: 151
I'm starting to use pentaho data integration, and I intend to use it to update a data lake with data from a server. However, I just need to add data that does not yet exist (increment) in the data lake.
Exemple of SQL:
SELECT COLUM1, COLUM2, COLUM3, COLUM4 FROM TABLEX
I don't know if I can do this increment via sql, filter or some other way.
Upvotes: 0
Views: 806
Reputation: 171
Let do it simple: Use Stream lookup and filter.
First step, from source you lookup to target table in lake by some keys(bussiness key, bla bla...) and get new column as name checker (Init checker equal 1 in select clause in lookup link).
Second step, if checker = 1 (Record exist in target) do nothing else insert new record to target.
Upvotes: 1
Reputation: 2440
There are multiple ways to achieve this
Example :
Take two table input steps(source and target) and two add a checksum step, then compare the checksum from source and target, if it does not match insert into target.
Upvotes: 1