user3600910
user3600910

Reputation: 3109

How to create an Index in Amazon Redshift

I'm trying to create indexes in Amazon Redshift but I received an error

create index on session_log(UserId);

UserId is an integer field.

Upvotes: 33

Views: 64499

Answers (3)

StuartLC
StuartLC

Reputation: 107237

If you try and create an index (with a name) on a Redshift table:

create index IX1 on "SomeTable"("UserId");

You'll receive the error

An error occurred when executing the SQL command: create index IX1 on "SomeTable"("UserId") ERROR: SQL command "create index IX1 on "SomeTable"("UserId")" not supported on Redshift tables.

This is because, like other data warehouses, Redshift uses columnar storage, and as a result, many of the indexing techniques (like adding non-clustered indexes) used in other RDBMS aren't applicable.

You do however have the option of providing a single sort key per table, and you can also influence performance with a distribution key for sharding your data, and selecting appropriate compression encodings for each column to minimize storage and I/O overheads.

For example, in your case, you may elect to use UserId as a sort key:

create table if not exists "SomeTable"
(
    "UserId" int,
    "Name" text
)
sortkey("UserId");

You might want to read a few primers like these

Upvotes: 70

user3600910
user3600910

Reputation: 3109

Redshift allow to create primary key

create table user (
id int ,
phone_number int,
primary key(id))

but since Redshift does not enforce this constraints, primary key accepts duplicate values.

attached article on that issue

http://www.sqlhaven.com/amazon-redshift-what-you-need-to-think-before-defining-primary-key/

Upvotes: 1

Ricardo Edo
Ricardo Edo

Reputation: 61

You can Define Constraints but will be informational only, as Amazon says: they are not enforced by Amazon Redshift. Nonetheless, primary keys and foreign keys are used as planning hints and they should be declared if your ETL process or some other process in your application enforces their integrity.

Some services like pipelines with insert mode (REPLACE_EXISTING) will need a primary key defined in your table.

For other performance purposes the Stuart's response is correct.

Upvotes: 2

Related Questions