MPT
MPT

Reputation: 25

Can someone please explain step by step what this 'by' and 'if' statements in SAS data step is doing?

I'm new to SAS and trying to understand what is happening in the below code

data def;
     set abc;
      by id;
     if last.id;
run;

I understand that by is used for sorting by id column, but what is if last.id doing?

Many thanks for help!

Upvotes: 0

Views: 267

Answers (2)

Richard
Richard

Reputation: 27508

The

if last.id;

is a special form of if different than other coding languages, notice there is no THEN clause. The form is known as a subsetting if. The DATA step flow of control beyond that if only occurs when test-expression is true.

In your code the test-expression is last.id.

last.<variable> (and corresponding first.<variable>) are two automatic temporary variables created for each BY variable and indicate if the current row is at a group edge row; either the first or last row of a by-group level.

You can infer information about where the current observation is within the group

FIRST.  LAST.   where
  1             at first in group
  0             not at first in group
          1     at last in group
          0     not at last in group
  1       0     group has >= 2 rows and currently at first in group
  0       0     group has >= 3 rows and currently in the middle part
  0       1     group has >= 2 rows and currently at last in group
  1       1     group has only 1 row

You do need to understand the underlying premises of the implicit loop fundamental to DATA/SET processing:

Upvotes: 1

Dirk Horsten
Dirk Horsten

Reputation: 3845

data def;

Create a data set named DEF (which will be stored in the default library WORK);

    set abc;

get your input from dataset ABC (which sas will look for in the default library WORK) This will automatically generate a loop over the observations (1);

    by id;

add temporary variables (2) first.id and last.id to indicate if this is the first/last observation with that id (3);

    if last.id;

All statements below, as well as the implicit output; statement, will only apply to the last observation of each id. It is equivalent to if not last.id then delete;.;

run;

Compile the above and run;

(1) In a SAS data step, you should (almost) never write something like

read file;
while not eof;
    do some stuf;
    read file;
end;

This is automated by the set and merge statements.

(2) Temporary variables exist in the "program vector" (i.e. are in scope in the data set), but are not written to the output data set.

(3) The word observation in SAS jargon is the same as row in database jargon. The difference with a record is, that the sas table, like a database, knows its own structure. Without that, the by statement and many other facilities of the language could not exist.

Upvotes: 0

Related Questions