peipei
peipei

Reputation: 1497

Redshift psql auto increment on even number

I am trying to create a table with an auto-increment column as below. Since Redshift psql doesn't support SERIAL, I had to use IDENTITY data type:

IDENTITY(seed, step)
Clause that specifies that the column is an IDENTITY column. An IDENTITY column contains unique auto-generated values. These values start with the value specified as seed and increment by the number specified as step. The data type for an IDENTITY column must be either INT or BIGINT.`

My create table statement looks like this:

CREATE TABLE my_table(
        id INT IDENTITY(1,1),
        name CHARACTER VARYING(255) NOT NULL,
        PRIMARY KEY( id ) 
);

However, when I tried to insert data into my_table, rows increment only on the even number, like below:

 id | name | 
----+------+
  2 | anna |
  4 | tom  |
  6 | adam |
  8 | bob  |
 10 | rob  |

My insert statements look like below:

INSERT INTO my_table ( name ) 
VALUES ( 'anna' ), ('tom') , ('adam') , ('bob') , ('rob' );

I am also having trouble with bringing the id column back to start with 1. There are solutions for SERIAL data type, but I haven't seen any documentation for IDENTITY. Any suggestions would be much appreciated!

Upvotes: 21

Views: 46647

Answers (3)

user1741851
user1741851

Reputation:

This issue is discussed at length in AWS forum.

https://forums.aws.amazon.com/message.jspa?messageID=623201

The answer from the AWS.

Short answer to your question is seed and step are only honored if you disable both parallelism and the COMPUPDATE option in your COPY. Parallelism is disabled if and only if you're loading your data from a single file, which is what we normally do not recommend, and hence will be an unlikely scenario for most users.

Parallelism impacts things because in order to ensure that there is no single point of contention in assigning identity values to rows, there end up being gaps in the value assignment. When parallelism is disabled, the load is happening serially, and therefore, there is no issue with assigning different id values in parallel.

The reason COMPUPDATE impacts things is when it's enabled, the COPY is actually making 2 passes over your data. During the first pass, it internally increments the identity values, and as a result, your initial value starts with a larger value than you'd expect.

We'll update the doc to reflect this.

Also multiple nodes seems to cause such effect with IDENTITY column. In essence it can only provide you with guaranteed unique IDs.

Upvotes: 6

Andrew Fogg
Andrew Fogg

Reputation: 695

Set your seed value to 1 and your step value to 1.

Create table

CREATE table my_table(
    id bigint identity(1, 1),
    name varchar(100),
    primary key(id));

Insert rows

INSERT INTO organization ( name ) 
VALUES ('anna'), ('tom') , ('adam'), ('bob'), ('rob');

Results

 id | name | 
----+------+
  1 | anna |
  2 | tom  |
  3 | adam |
  4 | bob  |
  5 | rob  |

For some reason, if you set your seed value to 0 and your step value to 1 then the integer will increase in steps of 2.

Create table

CREATE table my_table(
    id bigint identity(0, 1),
    name varchar(100),
    primary key(id));

Insert rows

INSERT INTO organization ( name ) 
VALUES ('anna'), ('tom') , ('adam'), ('bob'), ('rob');

Results

 id | name | 
----+------+
  0 | anna |
  2 | tom  |
  4 | adam |
  6 | bob  |
  8 | rob  |

Upvotes: 6

Jorge
Jorge

Reputation: 674

You have to set your identity as follows:

id INT IDENTITY(0,1)

Source: http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_examples.html

And you can't reset the id to 0. You will have to drop the table and create it back again.

Upvotes: 18

Related Questions