monkey
monkey

Reputation: 7

How to delete some of the observations in a column?

I have a dataset like this:

id
3408
3408
3485
4592
4932
5345
5345
5345
5844
5844
5844

I only want to keep the ids that appear 3 times (i.e., keep id=5345 and id=5844) and delete the rest. How do I achieve this in SAS? My data are sorted by id in order. I want to keep all three duplicated IDs in output dataset

Upvotes: 0

Views: 159

Answers (3)

Joe
Joe

Reputation: 63424

PROC FREQ will get you this directly.

proc freq data=myid;
tables id/out=threeobs(keep=count id where=(count=3));
run;

Use >= instead of = if you mean 3 or more. Based on comments, here is an example of merging back to the original data:

data have;
input id;
datalines;
3408
3408
3485
4592
4932
5345
5345
5345
5844
5844
5844
;;;;
run;

proc freq data=have;
tables id/out=ids(where=(count=3) keep=id count);
run;

proc sort data=have;
by id;
run;
data want;
merge have(in=h) ids(in=i);
by id;
if i;
run;

Upvotes: 2

cmjohns
cmjohns

Reputation: 4475

I wasn't sure if you wanted just a list of ID's that appeared 3 times or all rows who have an id that is replicated 3 times. If you want the former, the @bellvueBob's code will get you there.

Otherwise, here is one way to just get a list of the ID's that appear in the data set 3 times. The advantage to this code is small memory usage and speed since the data set is already sorted.

data threeobs(keep=id);
  set myid;
  by id;
  if first.id then cnt=1;
  else cnt+1;
  if cnt=3 then output;
run;

Upvotes: 2

BellevueBob
BellevueBob

Reputation: 9618

Using PROC SQL, you can JOIN and create a new data set, something like this:

proc sql;
   create table want as
   select a.*
   from have a
   join (
      select id
      from   have
      group by id
      having count(*) = 3
      ) b
   on b.id=a.id
quit;

Upvotes: 3

Related Questions