sreekem bose
sreekem bose

Reputation: 471

Keep all unique data in one column in SQL

I have a query to check the duplicates from a table:-

(SELECT assignment_name 
 FROM   (SELECT xx.supervisor_stg.*, 
                Row_number() 
                  over ( 
                    PARTITION BY assignment_name 
                    ORDER BY assignment_name) AS c 
         FROM   xx.supervisor_stg) 
 WHERE  c > 1) 

I want to delete the duplicate data from these data but such that if one value like assignment_name 'Smith' is written thrice then 1 value should remain and the other 2 should be deleted.

When i am using the following query i am getting an error :

Delete From supervisor_stg
where asg_name <>(SELECT Asg_Name 
 FROM   (SELECT xx.supervisor_stg.*, 
                Row_number() 
                  over ( 
                    PARTITION BY assignment_name 
                    ORDER BY assignment_name) AS c 
         FROM   xx.supervisor_stg) 
 WHERE  c > 1) 


SQL Error: ORA-01427: single-row subquery returns more than one row
01427. 00000 -  "single-row subquery returns more than one row"
*Cause:    
*Action:

Upvotes: 0

Views: 125

Answers (3)

Thorsten Kettner
Thorsten Kettner

Reputation: 94939

In Oracle you can delete from a subquery, e.g. delete from (select * from emp where empno < 10). Well, the example is stupid, but you get the idea. In your case this would be:

delete from
(
  select *
  from
  (
    select 
      s.*, 
      row_number() over (partition by assignment_name order by assignment_name) as rn
    from xx.supervisor_stg s
  )
  where rn > 1
);

Although this should work, it may happen that Oracle doesn't except this for some reason. Update: I've tried this in Oracle 11.2 and Oracle gives me ORA-01752: "cannot delete from view without exactly one key-preserved table". So although we are selecting from one table only, Oracle gets confused somehow. Read on and pick another solution :-)

An alternative would be to delete every record for which doesn't exist another entry with the same assignment_name and a lower (or higher) rowid.

delete from xx.supervisor_stg s
where not exists
(
  select *
  from xx.supervisor_stg other
  where other.assignment_name = s.assignment_name
  and other.rowid < s.rowid
);

Instead of a correlated subquery with NOT EXISTS you can use a non-correlated one with NOT IN, where you have all minimum (or maximum) rowids in a set and you delete all others.

delete from xx.supervisor_stg
where rowid not in
(
  select min(rowid)
  from xx.supervisor_stg
  group by assignment_name
);

As to your own delete statement:

  • Your subquery is not correlated to your main query, so you get many records. You'd have to replace <> with NOT IN for this reason.
  • <> makes no sense in combination with WHERE c > 1, because the latter gives you the records to delete not those to keep. Either make this IN (rather than NOT IN) or make it WHERE c = 1.
  • But then: You are comparing asg_name only. But the asg_name is the same for records to be deleted and records to keep. You'd have to use something that distinguishes records. The rowid is perfect for this.

You'd get:

Delete From supervisor_stg
Where rowid In
(
  SELECT rowid 
  FROM 
  (
    SELECT 
      xx.supervisor_stg.*, 
      Row_number() over (PARTITION BY assignment_name ORDER BY assignment_name) AS c 
    FROM xx.supervisor_stg
  ) 
  WHERE c > 1
);

which is overly complicated. As shown you can use simple aggregation (min(rowid) / group by assignment_name). You'd use a window function such as row_number() to avoid to have to read a table twice like in my first query. In your query you are still reading the table twice, once to find the records to keep, and then another time to find the records to delete.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269973

In Oracle, you can use rowid for this purpose:

delete from xx.SUPERVISOR_STG
    where rowid <> (select min(s2.rowid)
                    from xx.SUPERVISOR_STG
                    where s2.assignment_name = SUPERVISOR_STG.assignment_name
                   );

Upvotes: 0

Jens
Jens

Reputation: 3299

;WITH numbered AS (
    SELECT ROW_NUMBER() OVER(PARTITION BY [dupe-column-list] ORDER BY [dupe-column-list]) AS _dupe_num 
    FROM [table-name] 
    WHERE 1=1 -- any where clause if required
)
DELETE FROM numbered WHERE _dupe_num > 1;

This query will assign a row number to your table, based on the [dupe-column-list] fields you'll add there. You can provide an order for those records. The delete statement will remove all records showing up more than once for that [dupe-column-list] (except the first occurrance).

Edit: just noticed this is oracle. Not sure if the above will work for you then. The above is MSSQL.

Upvotes: 0

Related Questions