Kan
Kan

Reputation: 213

delete duplicate rows using informatica

How can we delete duplicate rows from source database table using informatica. This has to be done without using any other table. Means source and target is same table in Oracle database.

Upvotes: 0

Views: 9667

Answers (4)

Suneel Reddy
Suneel Reddy

Reputation: 1

I could think of two solutions, I'm not sure if these works though..

In the Source Qualifier, write the query to get the 'RowIDs' of the duplicate records and,

  1. Pass the RowID to the target and delete the records using Target Update Override, OR
  2. Try to delete using the 'SQL transformation' by passing the RowID.

Let me know if you've already got the solution.

Upvotes: 0

Nav
Nav

Reputation: 1394

Here I am writing 2 approaches I can think of

If you are allowed to write Query in Source Qualifier 1.a. Write a Query in Source Qualifier and select the data for Duplicate Rows(i.e.SELECT (* FROM table1 GROUP BY HAVING COUNT(*) > 1) 1.b. Use the columns selected to delete that row using Update Strategy DD_DELETE

If you need to bring all data in Informatica 2.a. As its a table bring Sorted Data or use sorter to sort data on keys 2.b. Use expression to find the duplicates using variable as IIF(KEY_PREV=KEY_CURR,Duplicate) 2.c. Take the duplicate records forward using filter 2.d. Use update strategy DD_DELETE to delete the records

Upvotes: 0

Maciejg
Maciejg

Reputation: 3353

Here's my evil idea. However just as above there is an assumption that we have some surogate key that uniquely identifies each row. So here goes:

  1. Add a Lookup Transformation with Dynamic lookup cache and Source filter set to something like 1=2 - we don't want any data to be read by the lookup in fact.
  2. Set up the Condition to compare all the ports you want to use to determine duplicates (eg. all but the surogate key)
  3. Each new row instance will be assigned the NewLookupPort=1 value. Each subsequent instance (i.e. duplicate) will result in NewLookupPort=2.
  4. Use Filter Transformation to discard all NewLookupPort=1 rows
  5. Use Update Transformation to set DD_DELETE for all the other rows.

As said, this assumes the existence of some surrogate key in the table.

Upvotes: 1

Marek Grzenkowicz
Marek Grzenkowicz

Reputation: 17343

You can use the Rank transformation to find duplicates:

  1. Group by all columns but the primary key.
  2. Check the Rank option for the primary key.
  3. Set the Number of Ranks to be equal to or greater than the maximum number of duplicates you are expecting for a single record.
  4. RANKINDEX will equal 1 for unique rows and every first duplicate, so use a Filter transformation to pass only rows with RANKINDEX > 1 to the target and use DD_DELETE update strategy.

Upvotes: 4

Related Questions