Reputation: 471
I am trying to delete duplicate rows from a csv file using SAS but haven't been able to do so. My data looks like-
site1,variable1,20151126000000,22.8,140,1
site1,variable1,20151126010000,22.8,140,1
site1,variable2,20151126000000,22.8,140,1
site1,variable2,20151126000000,22.8,140,1
site2,variable1,20151126000000,22.8,140,1
site2,variable1,20151126010000,22.8,140,1
The 4th row is a duplicate of the 3rd one. This is just an example, I have more than a thousand records in the file. I tried doing this by creating subsets but didn't get the desired results. Thanks in advance for any help.
Upvotes: 0
Views: 8186
Reputation: 551
In this article you find different options to remove duplicate rows: https://support.sas.com/resources/papers/proceedings17/0188-2017.pdf
If all columns are sorted the easiest way is to use the option noduprecs
:
proc sort data = file noduprecs;
by some_column;
run;
In contrast to the option nodupkey
no matter which column or columns you state after the by
it will always remove duplicate rows based on all columns.
Edit: Apparently, all columns have to be sorted (-> have a look at the comment below).
Upvotes: 0
Reputation: 502
I think you can use nodupkey
for this, just reference your key, or you can use _all_
-
proc sort data = file nodupkey;
by _all_;
run;
Upvotes: 2