Redshift insert improvement droping sorkey?

Question

I have a query that insert data to a table.

The process creates a table in the database as the end table:

CREATE TABLE IF NOT EXISTS {new_schema}.{new_table} (LIKE {schema}.{table} INCLUDING DEFAULTS);

Then, it inserts into stg table using:

INSERT INTO {new_schema}.{new_table} (columns....) SELECT columns... FROM {table} and some other logic

This inserts 4000 rows and takes around 2 minutes to insert. Table is completely empty. End table has around 450.000 rows.

End and stg table has disstyle EVEN, and sorkey(date) which is defined as date date encode az64.

I would like to know how can I improve that insert performance. I read in AWS Redshift Best practices that you can load data ordered by the sorkey, so this way you avoid vacuum.

In this case it means to use the following statement?:

SELECT columns... FROM {table} and some other logic ORDER BY date

Maybe in my specific case as I can have a load of data in source tables (load data to stg), but not as much in stg table should I drop the sortkey in stg_table and insert to it as I defined first (without order by) and then insert from stg table to end table using order by? This will limit the complexity of the first big query, and add the order complexity in the one that only has 4k rows.

Redshift insert improvement droping sorkey?

Answers (1)

Related Questions