Reputation: 689
I want to use select distinct
in a proc sql
statement for a specific column. I have around 10 columns, and I don't want to check the duplication for the whole 10 columns, I just want to check the duplication for one column.
So when I write the proc sql
like below, select distinct
checks the duplication through the all 10 columns:
proc sql;
create table newtable as
select distinct Col1, Col2, Col3, Col4, ... , Col10 from oldtable;
quit;
I tried to write something like below to check the duplication for only Col3, and adding other columns too into the new table. But it failed. I also tried different combinations like adding a new select distinct Col3
after the from oldtable
, but all combinations failed. So what is the trick?
proc sql;
create table newtable as
select (distinct Col3), * from oldtable;
quit;
Upvotes: 1
Views: 5867
Reputation: 63434
That's not how select distinct
works: it can only select distinct values from the entire return clause.
It's unclear what you mean by 'check duplication'. Assuming SAS finds a duplicate on one of the rows of your particular column, which row is it supposed to pick?
The easiest way to do this, if you don't particularly care which row comes back, is to use proc sort nodupkey
. SQL doesn't actually do this very well, particularly the ANSI 1999 version SAS uses; you can group by
the main variable and use max
or min
or similar for all of the other variables, but that doesn't necessarily give you a single real row.
If you do want a single real row, union
does have automatic deduplication in it; so you can use it like this. Selecting where 0
means nothing gets selected, but allows us to use union
.
data class_Extra;
set sashelp.class;
output;
if _n_=5 then output;
run;
proc sql;
create table class_nodup as
select * from class_extra where 0
union
select * from class_extra;
quit;
But a better solution: fix the original problem that created the duplicate.
Upvotes: 3