Suresh Alathur
Suresh Alathur

Reputation: 93

How to retrive duplicate records in talend integration


I want to retrieve duplicate records using talend integration open studio
. Example Records are:

id  name
1   suresh
2   ramesh
3   nagesh
4   suresh

Could anyone please answer for above queston
expected results are:

id  name
1   suresh
4   suresh

Thanks for advance

Upvotes: 2

Views: 4528

Answers (2)

Suresh Alathur
Suresh Alathur

Reputation: 93


Finally i have found the duplicate records.I have used bellow rules.
enter image description here


first need to map deliminator file to tuniqrow after that map duplicate rows from tuniqrow to taggretaterow.in taggregaterow grouping the id.after that map to the tmap. in tmap i have joined id==id and make sure as inner join.

Example Join condition

capture

Upvotes: 1

xto
xto

Reputation: 416

Until tUniqueRow duplicates doesn't work properly you can use a trick. I split your task into two steps.

Firstly you need to get names that are duplicated. You can do this by using tAggregateRow component. Group by name, and count number of ids. Then after filter count>1 you can save these results in tHashOutput. tHashOutput saves results in memory and it is possible to use them later.

duplicates_1st_step

In second step read your data again and using tMap match them with results saved in HashOutput. If you use Join Model = Inner Join them in tMap output you'll get only these names that exists in saved duplicates.

duplicates_2nd_step

Upvotes: 3

Related Questions