Reputation: 507
I have a macro inspired by "PROC SQL by Example" that finds duplicate rows based on a single column/field:
data have ;
input name $ term $;
cards;
Joe 2000
Joe 2000
Joe 2002
Joe 2008
Sally 2001
Sally 2003
; run;
%MACRO DUPS(LIB, TABLE, GROUPBY) ;
PROC SQL ;
CREATE TABLE DUPROWS AS
SELECT &GROUPBY, COUNT(*) AS Duplicate_Rows
FROM &LIB..&TABLE
GROUP BY &GROUPBY
HAVING COUNT(*) > 1
ORDER BY Duplicate_Rows;
QUIT;
%MEND DUPS ;
%DUPS(WORK,have,name) ;
proc print data=duprows ; run;
I would like to extend this to look for duplicates based on multiple columns (Rows 1 and 2 in my example), but still be flexible enough to deal with a single column.
In this case it would run the code:
proc sql ;
create table duprows as select name,term,count(*) as Duplicate_Rows
from work.have
group by name,term
HAVING COUNT(*) > 1
;quit;
To produce:
Upvotes: 1
Views: 59
Reputation: 5586
To include an arbitrary number of fields to group on, you can list them all in the groupby
macro parameter, but the list must be comma-delimited and surrounded by %quote()
. Otherwise SAS will see the commas and think you're providing more macro parameters.
So in your case, your macro call would be:
%dups(lib = work, table = have, groupby = %quote(name, term));
Since &groupby
is included in the select
and group by
clauses, all fields listed will appear in the output and will be used for grouping. This is because when &groupby
resolves, it becomes the text name, term
.
Upvotes: 2