Redshift tables are not preserving the SAS sort order after loading the data into Redshift

Question

When I load a SAS dataset sorted by a key(contains duplicate keys) using PROC SORT into Redshift, the same sorting order is not preserved in Redshift. For eg. When I compare the SAS dataset with Redshift table, they are not in the same sorting order, the data is really jumbled within the duplicates keys and very hard to find a pattern of how data stores in Redshift columns or why that's happening.

SAS dataset vs Redshift table sort order ]1

Any suggestions or guidance on maintaining the same sorting after data load into Redshift from SAS would be very helpful. Thanks!

The code used:

PROC APPEND BASE = Target_Table (bulkload=yes bl_compress=yes bl_bucket='xxx' bl_default_dir = 'xxx' bl_use_escape =YES) 
DATA = Source_Table force; 
RUN;

Allan Bowe · Accepted Answer

You need to set the sortkey when creating the table, eg:

proc sql;
connect using myredlib as rs;
execute (create table iwant (
   key INTEGER
  ,data char(1)
  ,load_dttm TIMESTAMP
)
distkey(key)
compound sortkey(key,data) )by rs;

If you have duplicates or you just need to preserve the source data order then create a new column (eg id) in a data step prior to the append, eg:

data iwant;
  set ihave;
  id=_n_;
run;
proc append .....

Redshift tables are not preserving the SAS sort order after loading the data into Redshift

Answers (2)

Related Questions