Swati Kanchan
Swati Kanchan

Reputation: 99

Do looping to match duplicates in SAS

I have a dataset where I have different names in one column, the names can be duplicate. My task here is to compare each and every name with the rest of the names in the column.For example if I take the name 1 "Vishal" I have to compare it with all the names from 2 to 13. If there is a matching name from row 2 to 13 there will be different column made "flag" with value of Y if there is a duplicate if no duplicate then a value of N.I have to perform this operation with all the names in the group

I have written a code which looks like this:

data Name;
input counter name $50.;
cards;
1 vishal
2 swati
3 sahil
4 suman
5 bindu
6 bindu
7 vishal
8 tushar
9 sahil
10 swati
11 gudia
12 priyansh
13 priyansh
;

proc sql;
select count(name) into: n from swati;
quit;

proc sql;
select name into: name1 -:name13 from swati;
quit;

options mlogic mprint symbolgen;
%macro swati;
data name1;
set swati;
%do i = 1 %to 1;
%do j= %eval(&i.+1) %to &n.;
if &&name&i. =&&name&j. then flag="N";
else flag="Y";
%end;
%end;
run;
%mend;
%swati;

WE HAVE TO DO THIS USING A DO LOOP

Upvotes: 0

Views: 447

Answers (2)

Tom
Tom

Reputation: 51566

Your answer for the last observation does not look right. Is there another condition such that if it is the last record the flag should be 'N' instead of 'Y'?

I really see no reason why you have to use a DO loop. But you could place a DO loop around a SET statement with the POINT= option to look for matching names.

data want ;
  set name nobs=nobs ;
  length next $50;
  next=' ';
  do p=_n_+1 to nobs until (next=name) ;
    set name(keep=name rename=(name=next)) point=p;
  end;
  if next=name then flag='N'; else flag='Y';
  drop next;
run;

You could also take advantage of the COUNTER variable and do it using GROUP BY in a SELECT statement in PROC SQL.

proc sql ;
  create table want2 as
    select *
         , case when (counter = max(counter)) then 'Y' else 'N' end as flag
    from name
    group by name
    order by counter
  ;
quit;

Upvotes: 0

Reeza
Reeza

Reputation: 21274

  1. Sort data by Name
  2. Use a data step with BY to identify duplicates
  3. Resort by Order if desired

    proc sort data=name;
    by name;
    run;
    
    data want;
    set name;
    by name;
    if first.name and last.name then unique='Y';
    else unique='N';
    run;
    
    proc sort data=want;
    by counter;
    run;
    

Upvotes: 2

Related Questions