Reputation: 213
How can we delete duplicate rows from source database table using informatica. This has to be done without using any other table. Means source and target is same table in Oracle database.
Upvotes: 0
Views: 9667
Reputation: 1
I could think of two solutions, I'm not sure if these works though..
In the Source Qualifier, write the query to get the 'RowIDs' of the duplicate records and,
Let me know if you've already got the solution.
Upvotes: 0
Reputation: 1394
Here I am writing 2 approaches I can think of
If you are allowed to write Query in Source Qualifier 1.a. Write a Query in Source Qualifier and select the data for Duplicate Rows(i.e.SELECT (* FROM table1 GROUP BY HAVING COUNT(*) > 1) 1.b. Use the columns selected to delete that row using Update Strategy DD_DELETE
If you need to bring all data in Informatica 2.a. As its a table bring Sorted Data or use sorter to sort data on keys 2.b. Use expression to find the duplicates using variable as IIF(KEY_PREV=KEY_CURR,Duplicate) 2.c. Take the duplicate records forward using filter 2.d. Use update strategy DD_DELETE to delete the records
Upvotes: 0
Reputation: 3353
Here's my evil idea. However just as above there is an assumption that we have some surogate key that uniquely identifies each row. So here goes:
Lookup Transformation
with Dynamic lookup cache
and Source filter
set to something like 1=2
- we don't want any data to be read by the lookup in fact.Condition
to compare all the ports you want to use to determine duplicates (eg. all but the surogate key)NewLookupPort=1
value. Each subsequent instance (i.e. duplicate) will result in NewLookupPort=2
.Filter Transformation
to discard all NewLookupPort=1
rowsUpdate Transformation
to set DD_DELETE
for all the other rows.As said, this assumes the existence of some surrogate key in the table.
Upvotes: 1
Reputation: 17343
You can use the Rank transformation to find duplicates:
Rank
option for the primary key. Number of Ranks
to be equal to or greater than the maximum number of duplicates you are expecting for a single record. RANKINDEX
will equal 1 for unique rows and every first duplicate, so use a Filter transformation to pass only rows with RANKINDEX > 1
to the target and use DD_DELETE
update strategy. Upvotes: 4