Reputation: 7
I have a dataset like this:
id
3408
3408
3485
4592
4932
5345
5345
5345
5844
5844
5844
I only want to keep the ids that appear 3 times (i.e., keep id=5345 and id=5844
) and delete the rest. How do I achieve this in SAS? My data are sorted by id
in order. I want to keep all three duplicated IDs in output dataset
Upvotes: 0
Views: 159
Reputation: 63424
PROC FREQ will get you this directly.
proc freq data=myid;
tables id/out=threeobs(keep=count id where=(count=3));
run;
Use >= instead of = if you mean 3 or more. Based on comments, here is an example of merging back to the original data:
data have;
input id;
datalines;
3408
3408
3485
4592
4932
5345
5345
5345
5844
5844
5844
;;;;
run;
proc freq data=have;
tables id/out=ids(where=(count=3) keep=id count);
run;
proc sort data=have;
by id;
run;
data want;
merge have(in=h) ids(in=i);
by id;
if i;
run;
Upvotes: 2
Reputation: 4475
I wasn't sure if you wanted just a list of ID's that appeared 3 times or all rows who have an id that is replicated 3 times. If you want the former, the @bellvueBob's code will get you there.
Otherwise, here is one way to just get a list of the ID's that appear in the data set 3 times. The advantage to this code is small memory usage and speed since the data set is already sorted.
data threeobs(keep=id);
set myid;
by id;
if first.id then cnt=1;
else cnt+1;
if cnt=3 then output;
run;
Upvotes: 2
Reputation: 9618
Using PROC SQL, you can JOIN
and create a new data set, something like this:
proc sql;
create table want as
select a.*
from have a
join (
select id
from have
group by id
having count(*) = 3
) b
on b.id=a.id
quit;
Upvotes: 3