Reputation: 15
When I load a SAS dataset sorted by a key(contains duplicate keys) using PROC SORT into Redshift, the same sorting order is not preserved in Redshift. For eg. When I compare the SAS dataset with Redshift table, they are not in the same sorting order, the data is really jumbled within the duplicates keys and very hard to find a pattern of how data stores in Redshift columns or why that's happening.
]1
Any suggestions or guidance on maintaining the same sorting after data load into Redshift from SAS would be very helpful. Thanks!
The code used:
PROC APPEND BASE = Target_Table (bulkload=yes bl_compress=yes bl_bucket='xxx' bl_default_dir = 'xxx' bl_use_escape =YES)
DATA = Source_Table force;
RUN;
Upvotes: 0
Views: 414
Reputation: 12691
You need to set the sortkey
when creating the table, eg:
proc sql;
connect using myredlib as rs;
execute (create table iwant (
key INTEGER
,data char(1)
,load_dttm TIMESTAMP
)
distkey(key)
compound sortkey(key,data) )by rs;
If you have duplicates or you just need to preserve the source data order then create a new column (eg id
) in a data step prior to the append, eg:
data iwant;
set ihave;
id=_n_;
run;
proc append .....
Upvotes: 3
Reputation: 27508
It depends on the state of the target table at the start of append.
IIRC by default new tables have no inherent sort order and a sorted result set is only guaranteed if a SQL query has an ORDER BY
clause.
Tables can be have features such as SORT ORDER
and INTERLEAVED SORT KEYS
, but would have to be added RedShift statement performed via Proc SQL pass through EXECUTE BY
or Amazon Redshift console.
See SO Why does Redshift not need materialized views or indexes? for more helpful information.
Upvotes: 0