itzy
itzy

Reputation: 11755

Order of rows returned by PROC SQL

I'm wondering about the order in which data is returned by PROC SQL if there is no sort or group by statement. Is it always the same?

For example, suppose I do this:

proc sql;
    create table cusips as
        select a.cusip as c1, b.cusip as c2
        from shrs a, shrs b;
quit;

which just gives me a list of every combination of two cusip variables. Will the order of these be the same as in the original input data? It looks like it is in my case, but I'm wondering if I can be sure that this will always be true.

Upvotes: 2

Views: 228

Answers (1)

Joe
Joe

Reputation: 63424

PROC SQL does not guarantee the same order in any circumstance. In general, if you're doing a simple select from a single table, it will probably return in the order the data was already in, largely because it's more work to not do that.

However, PROC SQL if it decides it is optimal to do so, will do things like split data into chunks and process them separately in parallel - not as much as it should probably, but it will sometimes nonetheless. This happens frequently with a join, in particular as it may do it as a hash join in which case you could get it in one table's order or both, or a cartesian product, or a few other options.

In particular, in your join above, the SQL optimizer might choose to do it as a cartesian product, as a hash join, as an index join, or several other methods. Each of them will likely lead to a different result. Even within that, it's possible that if the join is done with multiple threads, the threads may be balanced differently depending on the conditions of the server/computer it is performed on at the time (ie, if one CPU is under more/less load from other processes, it may get less/more of the data).

As such, you should never rely on it returning data back in order - always ask for it to be returned in order.

This is also why the monotonic() function is not recommended for production code (as it may not always return the expected values if the final order is different from the original order).

If you are looking to get the data back in the original order and don't know the original order or don't want to specify the complete order, one good workaround is to use a datastep view to get the original order. Datastep views will process the data in order.

data myview/view=myview;
  set mydata;
  _ordervar+1;
run;

proc sql;
  select * from myview
    order by _ordervar;
quit;

Upvotes: 2

Related Questions