Skyler
Skyler

Reputation: 170

Keeping only the duplicates

I'm trying to keep only the duplicate results for one column in a table. This is what I have.

proc sql; 
    create table DUPLICATES as 
    select Address, count(*) as count 
    from TEST_TABLE
    group by Address
    having COUNT gt 1 
    ;
quit;

Is there any easier way to do this or an alternative I didn't think of? It seems goofy that I then have to re-join it with the original table to get my answer.

Upvotes: 2

Views: 30739

Answers (4)

J.Q
J.Q

Reputation: 1031

Using proc sort with nodupkey and dupout will dedupe the data and give you an "out" dataset with duplicate records from the original dataset, but the "out" dataset does not include EVERY record with the ID variable - it gives you the 2nd, 3rd, 4th...Nth. So you aren't comparing all the duplicate occurrences of the ID variable when you use this method. It's great when you know what you want to remove and define enough by variables to limit this precisely, or if you know that your records with duplicate IDs are identical in every way and you just want them removed.

When there are duplicates in a raw file I receive, I like to compare all records where ID has more than one occurrence.

proc sort data=test nouniquekeys
     uniqueout=singles
           out=dups;
by=ID;
run;
  • nouniquekeys deletes unique observations from the "out" DS
  • uniqueout=dsname stores unique observations
  • out=dsname stores remaining observations

Again, this method is great for working with messy raw data, and for debugging if your code might have produced duplicates.

Upvotes: 8

isJustMe
isJustMe

Reputation: 5470

That's easy using a data step:

proc sort data=TEST_TABLE nodupkey dupout=dups; 
 by Address; 
run; 

Refer to this documentation for further information

Upvotes: 7

DCWoods
DCWoods

Reputation: 123

proc sort data=TEST_TABLE;
    by Address;
run;

data DUPLICATES;
    set TEST_TABLE;
    by Address;
    if not (first.Address and last.Address) then output;
run;

Upvotes: 8

Thomas Barnes
Thomas Barnes

Reputation: 76

select field,count(field) from table 
group by field having count(field) > 1

Upvotes: 1

Related Questions