Reputation: 253
I have a dataset that contains 250 variables. I think some rows may be exact duplicates. If I only had 3 variables, I could run this code to check for dupes:
proc sql;
create table checkDupe as
select count(*) as N, *
from bigTable
group by 1, 2, 3
having N > 1;
quit;
However, with 250 variables, I don't want to type out group by 1, 2, 3, ... , 250
.
The following group by
statements don't work:
group by *
group by _ALL_
group by 1:250
Is there a concise way to group by all variables?
Upvotes: 1
Views: 7274
Reputation: 6378
Your sample code checking for duplicates returns a dataset that has all of the records where the key is NOT unique. So the below returns four records:
data bigtable;
input a b c d;
datalines;
1 1 1 1
1 1 1 1
1 2 1 1
1 2 2 1
1 2 2 2
;
run;
proc sql;
create table checkDupe1 as
select *,count(*) as N
from bigTable
group by 1, 2, 3
having N > 1;
quit;
Assuming you don't need the N
column, they easiest way in SAS to get a dataset of all non-unique rows is to use PROC SORT
with the NOUNIQUEKEY
option. So below will return the same four records:
proc sort data=bigtable out=checkDupe2 nouniquekey ;
by a b c ;
run ;
And the BY
statement can be BY _ALL_
, which will return the two records that are duplicates on all variables:
proc sort data=bigtable out=checkDupe3 nouniquekey ;
by _all_ ;
run ;
Upvotes: 1
Reputation: 10855
If the desired output is each unique row from a SAS data set, no grouping is required. The easiest way to execute this in PROC SQL
is with the distinct
keyword.
For example:
data one;
input a b c d;
datalines;
1 1 1 1
1 1 1 1
1 2 1 1
1 2 3 4
1 2 1 1
;
run;
proc sql;
select distinct * from one;
run;
...produces the following output, where the two duplicate rows are eliminated.
Upvotes: 0
Reputation: 4554
Try to produce macro variable for column names of dataset, then group in sql;
proc sql;
select name into: name separate ',' from dictionary.columns where libname='SASHELP' and memname='CLASS';
quit;
proc sql;
select count(*) as N,* from sashelp.class group by &name;
quit;
If just want to know whether there are complete duplicates in dataset, try this:
proc sort data=sashelp.class out=want nodup;
by _all_;
run;
Upvotes: 3