Mohan
Mohan

Reputation: 15

Redshift tables are not preserving the SAS sort order after loading the data into Redshift

When I load a SAS dataset sorted by a key(contains duplicate keys) using PROC SORT into Redshift, the same sorting order is not preserved in Redshift. For eg. When I compare the SAS dataset with Redshift table, they are not in the same sorting order, the data is really jumbled within the duplicates keys and very hard to find a pattern of how data stores in Redshift columns or why that's happening.

SAS dataset vs Redshift table sort order]1

Any suggestions or guidance on maintaining the same sorting after data load into Redshift from SAS would be very helpful. Thanks!

The code used:

PROC APPEND BASE = Target_Table (bulkload=yes bl_compress=yes bl_bucket='xxx' bl_default_dir = 'xxx' bl_use_escape =YES) 
DATA = Source_Table force; 
RUN;

Upvotes: 0

Views: 414

Answers (2)

Allan Bowe
Allan Bowe

Reputation: 12691

You need to set the sortkey when creating the table, eg:

proc sql;
connect using myredlib as rs;
execute (create table iwant (
   key INTEGER
  ,data char(1)
  ,load_dttm TIMESTAMP
)
distkey(key)
compound sortkey(key,data) )by rs;

If you have duplicates or you just need to preserve the source data order then create a new column (eg id) in a data step prior to the append, eg:

data iwant;
  set ihave;
  id=_n_;
run;
proc append .....

Upvotes: 3

Richard
Richard

Reputation: 27508

It depends on the state of the target table at the start of append.

IIRC by default new tables have no inherent sort order and a sorted result set is only guaranteed if a SQL query has an ORDER BY clause.

Tables can be have features such as SORT ORDER and INTERLEAVED SORT KEYS, but would have to be added RedShift statement performed via Proc SQL pass through EXECUTE BY or Amazon Redshift console.

See SO Why does Redshift not need materialized views or indexes? for more helpful information.

Upvotes: 0

Related Questions