Shashank Prasad
Shashank Prasad

Reputation: 1

Does using BY statement in SAS auto retains variables or not

I am trying to find cumulative sum for my dataset based on name. I have following 2 methods of doing so with very little difference.

The dataset looks like below

1 David 45
1 David 74
2 Sam 45
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran 63
8 Priya 72

PROC SORT DATA = READIN;
BY ID;
RUN;



DATA READIN1;
SET READIN;
BY ID;
if first.id then 
cum_score=score;
else cum_score+score;
run;


DATA READIN2;
SET READIN;
if first.id then
cum_score=score;
else score+cum_score;
run;

While the first method gives me correct cumulative frequency, also suggesting that in BY group processing, newly created variables are auto retained, the second method gives incorrect results suggesting new variables aren't auto retained I by group processing. Interestingly, the only change in the two methods are the cum_score+score is changed to score+cum_score(which shall give same result according to laws of addition)

Please explain what is happening here.

Upvotes: 0

Views: 80

Answers (2)

Tom
Tom

Reputation: 51601

This has nothing to do with the BY processing. It is about the meaning of a sum statement and the impact of trying to retain variables that are being loaded by a SET statement.

A sum statement is of the form

variable + expression ;

It is essentially equivalent to these two statements:

retain variable 0 ;
variable = sum(variable,expression);

When you retain a variable then SAS will not reset it to missing at the start of the next data set iteration. So the value it had at the end of iteration N is "retained" to the start of iteration N+1. But if that variable is included from a dataset then the retained value will be lost when it is overwritten by the value read from the dataset.

Also note that all variables that are defined in input datasets are automatically retained, but since normally they are overwritten when the SET (or MERGE, UPDATE, etc) statement runs you don't notice it.

Upvotes: 1

Reeza
Reeza

Reputation: 21274

cum_score + score and score + cum_score are mathematically the same, but not the same in SAS.

A SUM statement has an implicit retain, but it matters what order the variables are in the statement. This is because variables that already exist, are not retained, at least not how you'd expect it to be retained, and if cum_score is first in the statement it will be retained.

Upvotes: 0

Related Questions