Konstantin Maksimov
Konstantin Maksimov

Reputation: 31

SAS proc sort algorithm

I googled, but I didn't find information about what algorithm is behind the scenes in the proc sort in SAS? In Python, for example, sort() uses timsort .

Upvotes: 3

Views: 1472

Answers (1)

user667489
user667489

Reputation: 9569

As Stu has observed, proc sort is closed source, so the best we can do is speculate. Having said that, rather than there being just one algorithm used in all situations, I suspect that the choice of sorting algorithm(s) depends on at least the following factors:

  • The platform on which SAS is running
  • The libname engines through which the source and destination datasets are managed.
  • The settings used in the proc sort statement - in particular, noequals (which requests a slightly faster but unstable sort), tagsort and threads.
  • The amount of memory available for the sort as defined via the sortsize and memsize system options.
  • The size of the input dataset
  • Whether any third-party sorting engines (e.g. SyncSort) are being called, rather than the SAS default ones, via the sortpgm, sortcutp and other associated system options.

It is worth noting that SAS has been around through many generations of computer hardware, and the optimal choice of sorting algorithm is heavily dependent on the hardware. Even bubble sort can theoretically be optimal on old enough systems. I would very much expect SAS to account for this sort of thing.

Upvotes: 2

Related Questions