Reputation: 25
I'm new to SAS and trying to understand what is happening in the below code
data def;
set abc;
by id;
if last.id;
run;
I understand that by
is used for sorting by id
column, but what is if last.id
doing?
Many thanks for help!
Upvotes: 0
Views: 267
Reputation: 27508
The
if last.id;
is a special form of if
different than other coding languages, notice there is no THEN
clause. The form is known as a subsetting if. The DATA step flow of control beyond that if
only occurs when test-expression is true.
In your code the test-expression is last.id
.
last.<variable>
(and corresponding first.<variable>
) are two automatic temporary variables created for each BY
variable and indicate if the current row is at a group edge row; either the first or last row of a by-group level.
You can infer information about where the current observation is within the group
FIRST. LAST. where
1 at first in group
0 not at first in group
1 at last in group
0 not at last in group
1 0 group has >= 2 rows and currently at first in group
0 0 group has >= 3 rows and currently in the middle part
0 1 group has >= 2 rows and currently at last in group
1 1 group has only 1 row
You do need to understand the underlying premises of the implicit loop fundamental to DATA/SET
processing:
Upvotes: 1
Reputation: 3845
data def;
Create a data set named DEF
(which will be stored in the default library WORK
);
set abc;
get your input from dataset ABC
(which sas will look for in the default library WORK
) This will automatically generate a loop over the observations (1);
by id;
add temporary variables (2) first.id
and last.id
to indicate if this is the first/last observation with that id
(3);
if last.id;
All statements below, as well as the implicit output;
statement, will only apply to the last observation of each id
. It is equivalent to if not last.id then delete;
.;
run;
Compile the above and run;
(1) In a SAS data step, you should (almost) never write something like
read file;
while not eof;
do some stuf;
read file;
end;
This is automated by the set
and merge
statements.
(2) Temporary variables exist in the "program vector" (i.e. are in scope in the data set), but are not written to the output data set.
(3) The word observation in SAS jargon is the same as row in database jargon. The difference with a record is, that the sas table, like a database, knows its own structure.
Without that, the by
statement and many other facilities of the language could not exist.
Upvotes: 0