Reputation: 2640
When running a data step in SAS, why does the output statement seem to 'stop' the iterating of the set statement?
I need to conditionally output duplicate observations. While I can use a plethora of output statements, I'd like if SAS did it's normal iterating and output just created an additional observation.
1) Does the run
statement in SAS have a built in output
statement? (The way sum statements have a built in retain
)
2) What is happening when I ask SAS to output certain observations - in particular after a set
statement? Will it set all the values until a condition and then only keep the values I request? or does it have some kind of similarities with other statements such as the point=
statement?
3) Is there a similar statement to output
that will continue to set the values from a previous data step and then output an additional observation when requested?
For example:
data test;
do i = 1 to 100;
output;
end;
run;
data test2;
set test;
if _N_ in (4 8 11) then output;
run;
data test3;
set test;
if _N_ in (4 8 11) then output;
output;
run;
test has 100 observations, test2 has 3 observations, and test3 has 103 observations. This make me think that there is some kind of built in output statement for either the run statement, or the data step itself.
Upvotes: 1
Views: 1984
Reputation: 51611
You are very close.
1) There is an implied OUTPUT
at the end of the data step, unless your data step includes an explicit OUTPUT
statement. That is why your first step wrote all 100 observations and the second only three.
2) The OUTPUT
statement tells SAS to write the current record to the output dataset.
3) There is not a direct way to do what you want to duplicate records without using OUTPUT
statements, but for some similar problems you can cause the duplication on the input side instead of the output side.
For example if you felt your class didn't have enough eleven year-olds you could make two copies of all eleven year-olds by reading them twice.
data want;
set sashelp.class
sashelp.class(where=(age=11))
;
by name;
run;
Upvotes: 0
Reputation: 12909
The best way to understand everything is by reading about the Program Data Vector (PDV). The short answer to your questions:
The output
statement is implied at the run
boundary of every SAS data step that uses set
, merge
, update
, or (nothing).
The set
statement takes the contents of the current row and reads them into the PDV, if you have a single set
statement
The output
statement simply outputs the contents of the PDV at that moment into your output dataset
SAS only goes to a new row in the source dataset defined by your set
statement when it reaches a run
boundary, delete
statement, return
statement, or failing the conditions of an if
without then
statement
point=
forces SAS to go directly to an observation number defined by a variable; otherwise, it will read every row sequentially, one by one
Upvotes: 1
Reputation: 914
It's implicit at the end, unless it's used in one or more places in that data step.
Each time the execution encounters an OUTPUT
statement, or the implicit one if it exists, it will output a new row.
Upvotes: 0
Reputation: 63424
output
in SAS is an explicit instruction to write out a row to the output dataset(s) (all of the dataset(s) named in the data
statement, unless you specify a single dataset in output
).
run
, in addition to ending the step (meaning no statements after run
are processed until that data step is completed - equivalent to the ending }
in a c-style programming language module, basically) contains an implicit return
statement.
Unless you are using link
or goto
, return
tells SAS to return to the beginning of the data step loop. In addition, return contains an implicit output
statement that outputs rows to all datasets named in the data statement, unless there is an output
statement in the data step code - in which case that is not present.
It is return
that causes SAS to actually stop processing things after it - not the output. In fact, SAS happily does things after the output
statement; they just may not be output anywhere. For example:
data x;
do row = 1 to 100;
output;
row_prev+1;
end;
run;
That row_prev+1
statement is executed, even though it's after the output statement - its presence can be seen on the next row. In your example where you told it to just output three rows, it still processed the other 97 - just nothing was output from them. If any effects had happened from that processing, it would occur - in fact, the incrementing of _n_
is one of those effects (_n_
is not the row number, but the iteration count of data step looping).
You should probably read up on the data step itself. SAS documentation includes a lot of information on that, or you could read papers like The Essence of Data Step Programming. This sort of thing is quite common in SGF papers, in part because SAS certification requires understanding this fairly well.
Upvotes: 3