user3658367
user3658367

Reputation: 641

subset of dataset using first and last in sas

Hi I am trying to subset a dataset which has following

ID sal count 
1  10  1
1  10  2
1  10  3
1  10  4
2  20  1
2  20  2
2  20  3
3  30  1
3  30  2
3  30  3
3  30  4

I want to take out only those IDs who are recorded 4 times.

I wrote like

data AN; set BU
if last.count gt 4 and last.count lt 4 then delete;
run;

But there is something wrong.

Upvotes: 0

Views: 146

Answers (3)

Reeza
Reeza

Reputation: 21274

Slight variation on Tims - assuming you don't necessarily have the count variable.

proc sql;
  CREATE TABLE AN as
  SELECT * FROM BU
  GROUP BY ID
  HAVING Count(ID) >= 4;
quit;

Upvotes: 0

Tim Sands
Tim Sands

Reputation: 1068

EDIT - Thanks for clarifying. Based on your needs, PROC SQL will be more direct:

proc sql;
  CREATE TABLE AN as
  SELECT * FROM BU
  GROUP BY ID
  HAVING MAX(COUNT) = 4
;quit;

For posterity, here is how you could do it with only a data step:

In order to use first. and last., you need to use a by clause, which requires sorting:

proc sort data=BU;
 by ID DESCENDING count;
run;

When using a SET statement BY ID, first.ID will be equal to 1 (TRUE) on the first instance of a given ID, 0 (FALSE) for all other records.

data AN;
  set BU;
  by ID;
  retain keepMe;
  If first.ID THEN DO;
     IF count = 4 THEN keepMe=1;
     ELSE keepMe=0;
  END;

  if keepMe=0 THEN DELETE;
run;

During the datastep BY ID, your data will look like:

ID sal count keepMe  first.ID
1  10  4     1       1
1  10  3     1       0
1  10  2     1       0
1  10  1     1       0
2  20  3     0       1
2  20  2     0       0
2  20  1     0       0
3  30  4     1       1
3  30  3     1       0
3  30  2     1       0
3  30  1     1       0

Upvotes: 2

Vasilij Nevlev
Vasilij Nevlev

Reputation: 1449

If I understand correct, you are trying to extract all observations are are repeated 4 time or more. if so, your use of last.count and first.count is wrong. last.var is a boolean and it will indicate which observation is last in the group. Have a look at Tim's suggestion.

In order to extract all observations that are repeated four times or more, I would suggest to use the following PROC SQL:

PROC SQL;
   CREATE TABLE WORK.WANT AS 
   SELECT /* COUNT_of_ID */
            (COUNT(t1.ID)) AS COUNT_of_ID, 
          t1.ID, 
          t1.SAL, 
          t1.count
      FROM WORK.HAVE t1
      GROUP BY t1.ID
      HAVING (CALCULATED COUNT_of_ID) ge 4
      ORDER BY t1.ID,
           t1.SAL,
           t1.count;
QUIT;

Result:

1   10  1
1   10  2
1   10  3
1   10  4
3   30  1
3   30  2
3   30  3
3   30  4

Upvotes: 1

Related Questions