asmi
asmi

Reputation: 471

Delete duplicate rows in SAS

I am trying to delete duplicate rows from a csv file using SAS but haven't been able to do so. My data looks like-

site1,variable1,20151126000000,22.8,140,1
site1,variable1,20151126010000,22.8,140,1
site1,variable2,20151126000000,22.8,140,1
site1,variable2,20151126000000,22.8,140,1
site2,variable1,20151126000000,22.8,140,1
site2,variable1,20151126010000,22.8,140,1

The 4th row is a duplicate of the 3rd one. This is just an example, I have more than a thousand records in the file. I tried doing this by creating subsets but didn't get the desired results. Thanks in advance for any help.

Upvotes: 0

Views: 8186

Answers (2)

the_economist
the_economist

Reputation: 551

In this article you find different options to remove duplicate rows: https://support.sas.com/resources/papers/proceedings17/0188-2017.pdf

If all columns are sorted the easiest way is to use the option noduprecs:

proc sort data = file noduprecs;
by some_column;
run;

In contrast to the option nodupkey no matter which column or columns you state after the by it will always remove duplicate rows based on all columns.

Edit: Apparently, all columns have to be sorted (-> have a look at the comment below).

Upvotes: 0

SMW
SMW

Reputation: 502

I think you can use nodupkey for this, just reference your key, or you can use _all_ -

proc sort data = file nodupkey;
by _all_;
run;

Upvotes: 2

Related Questions