Beginner
Beginner

Reputation: 13

SAS code, sort twice?

proc lifetest data=hodgkins outsurv=KM_data /*noprint*/;
   time maltime*mcens(0);
   strata age_gt30; 
run;    

proc sort data=KM_data; 
   by age_gt30; 
run;

data KM_data; 
   set KM_data; 
   by age_gt30; 
/* My question: Why age_gt30 are being sorted twice?*/
   Failure = 1‐Survival; 
   output; run;

Great Thanks, DomPazz and Peter Flom! It was surprised for me that creating of new variable (name Failure) needs "BY statement" to be done. I'll understand better if you say, aren't 1) and 2) the same or not? /P.S. age_gt30=0 or 1/

1) proc lifetest data=hodgkins outsurv=KM_data; time maltime*mcens(0); strata age_gt30; run;

proc sort data=KM_data; by age_gt30; run;

data KM_data; set KM_data; by age_gt30; Failure = 1‐Survival; output; run;

2) proc lifetest data=hodgkins outsurv=KM_data; time maltime*mcens(0); strata age_gt30; run;

data KM_data; set KM_data; Failure = 1‐Survival; output; run;

Upvotes: 0

Views: 331

Answers (2)

DomPazz
DomPazz

Reputation: 12465

The data set is only being sorted once.

proc sort data=KM_data; 
   by age_gt30; 
run;

The data step at the end is calculating a variable named Failure. There is a BY statement that would require a sort to be done. I THINK your confusion is that BY statement. It tells the Data Step to create temporary variables (not put to the output data set) that help you find the start and end of all values with the same BY group value.

However, in the code for that data step, there is no need for that BY statement. Nothing is being done within the by groups.

This simplified code does the same thing as that data step.

data KM_data; 
   set KM_data; 
   Failure = 1‐Survival; 
run;

Upvotes: 3

Peter Flom
Peter Flom

Reputation: 2416

It's not being sorted twice, it's being sorted once (on PROC SORT) but SAS requires that data be sorted in order to run a BY statement.

Upvotes: 1

Related Questions