eykanal
eykanal

Reputation: 27047

Delete rows from A matching rows in B in SAS

I currently have a table with multiple duplicates across the primary key columns due to a lack of integrity checks. Be that as it may, I'm attempting to remove the duplicates. The problem is, there's no id column, which means that finding the duplicates is nontrivial.

My current solution involves using the count(*)... having construct to create a second table, and selecting the rows to be deleted into there. My problem is that the SAS delete command doesn't allow for the following:

proc sql;
delete from TableA
where (v1,v2,v3) in TableB

Is there any way to delete from one table based on the contents of another?

Upvotes: 1

Views: 14681

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1270401

You can do this with a correlated subquery:

proc sql;
delete from TableA
    where exists (select 1
                  from TableB
                  where b.pk = TableA.v1 or b.pk = TableA.v2 or b.pk = TableA.v3
                 )

Upvotes: 1

BellevueBob
BellevueBob

Reputation: 9618

If I understand correctly, you want to remove every observation from your dataset where more than one observation has the same value of your "key" variables (removing ALL duplcates).

The best and easiest way to do that with SAS is to sort that dataset by your "key" variables and then use another data step to create your new copy. Harder to explain than to illustrate:

data have;
   input x y z;
   datalines4;
1 2 3
1 2 3
2 3 4
3 4 5
3 4 6
3 4 7
4 5 6
4 5 6
;;;;
run;

proc sort data=have;
   by x y z;
run;

data want;
   set have;
      by x y z;
   if  first.z and last.z;
run; 

The sub-setting IF statement keeps only the four "unique" observations using the automatic FIRST. and LAST. variables created when you use a BY statement.

If instead you wanted to keep one of the duplicates, you could use the NODUPKEY option with PROC SORT:

proc sort nodupkey data=have out=want2;
   by x y z;
run;

This will eliminate two of the observations from the example, leaving you with unique values for your key. You cannot control which observations are retained using this technique.

Upvotes: 2

waka
waka

Reputation: 3417

Try this:

proc sql;
delete from TableA as a
where a.v1 = (select b.v1 from TableB as b where a.primaryKey = b.foreignKeyForTableA)

and so on for the other values. However, since you may recieve duplicates (ie more than one result) from TableB you might want to try out "select distinct" or "select Top 1" to only get one result.

Upvotes: 1

Related Questions