Is sorting more favorable (efficient) in if-else statement?

Question

Assume two functions fun1, fun2 have been defined to carry out some calculation given input x.

The structure of data have is:

Day      Group  x
01Jul14  A      1.5
02JUl14  B      2.7

I want to do sth like this:

data want;
  set have;
  if Group = 'A' then y = fun1(x);
  if Group = 'B' then y = fun2(x);
run;

Is it better to do proc sort data=have;by Group;run; first then move on to the data step? Or it doesn't matter because each time it just picks one observation and determines which if statement it falls into?

Joe · Accepted Answer

So long as you are not doing anything to alter the normal input of observations - such as using random access (point=), building a hash table, using a by statement, etc. - sorting will have no impact: you read each row regardless of the if statement, check both lines, execute one of them. Nothing different occurs sorted or unsorted.

This is easy to test. Write something like this:

%put Before Unsorted Time: %sysfunc(time(),time8.);
***your datastep here***;
%put After Unsorted Time: %sysfunc(time(),time8.);

proc sort data=your_dataset;
by x;
run;

%put Before Sorted Time: %sysfunc(time(),time8.);
***your datastep here***;
%put After Sorted Time: %sysfunc(time(),time8.);

Or just run your datasteps and look at the execution time!

You may be confusing this with sorting your if statements (ie, changing the order of them in the code). That could have an impact, if your data is skewed and you use else. That's because SAS won't have to evaluate further downstream conditionals. It's not very common for this to have any sort of impact - it only matters when you have extremely skewed data, large numbers of observations, and certain other conditions based on your code - so I wouldn't program for it.

Is sorting more favorable (efficient) in if-else statement?

Answers (1)

Related Questions