Reputation: 13
proc lifetest data=hodgkins outsurv=KM_data /*noprint*/;
time maltime*mcens(0);
strata age_gt30;
run;
proc sort data=KM_data;
by age_gt30;
run;
data KM_data;
set KM_data;
by age_gt30;
/* My question: Why age_gt30 are being sorted twice?*/
Failure = 1‐Survival;
output; run;
Great Thanks, DomPazz and Peter Flom! It was surprised for me that creating of new variable (name Failure) needs "BY statement" to be done. I'll understand better if you say, aren't 1) and 2) the same or not? /P.S. age_gt30=0 or 1/
1) proc lifetest data=hodgkins outsurv=KM_data; time maltime*mcens(0); strata age_gt30; run;
proc sort data=KM_data; by age_gt30; run;
data KM_data; set KM_data; by age_gt30; Failure = 1‐Survival; output; run;
2) proc lifetest data=hodgkins outsurv=KM_data; time maltime*mcens(0); strata age_gt30; run;
data KM_data; set KM_data; Failure = 1‐Survival; output; run;
Upvotes: 0
Views: 331
Reputation: 12465
The data set is only being sorted once.
proc sort data=KM_data;
by age_gt30;
run;
The data step at the end is calculating a variable named Failure
. There is a BY
statement that would require a sort to be done. I THINK your confusion is that BY
statement. It tells the Data Step to create temporary variables (not put to the output data set) that help you find the start and end of all values with the same BY
group value.
However, in the code for that data step, there is no need for that BY
statement. Nothing is being done within the by groups.
This simplified code does the same thing as that data step.
data KM_data;
set KM_data;
Failure = 1‐Survival;
run;
Upvotes: 3
Reputation: 2416
It's not being sorted twice, it's being sorted once (on PROC SORT) but SAS requires that data be sorted in order to run a BY statement.
Upvotes: 1