Derek
Derek

Reputation: 355

PostgreSQL add id column that increments based on data

I am relatively new to using Postgres, coming from a MySQL background. I am using Postgres 9.3.4 on Windows x64.

We are being supplied data in multiple fixed length text files. The first digit on each line is a number between 1 and 4 that indicates the record type of the data in that row. The rows are grouped sequentially such that there will always first be a row of type 1 followed by zero or more rows of the other types.

data_x.txt
---------------------
1data01
2data02
4data03
4data04
1data05
1data06
3data07

To import this into Postgres I have used the following SQL commands:

CREATE TABLE data_raw (
    raw_data TEXT
);

COPY data_raw FROM 'C:\path\data_x.txt' ...; -- Repeated for each file

ALTER TABLE data_raw
    ADD COLUMN indicator integer;

UPDATE data_raw SET
    indicator = CAST(substr(raw_data, 1, 1) AS integer),
    raw_data = substr(raw_data, 2);

I then create tables for each of the 4 record types:

CREATE TABLE table_1 SELECT raw_data FROM data_raw WHERE indicator = 1;
CREATE TABLE table_2 SELECT raw_data FROM data_raw WHERE indicator = 2;
CREATE TABLE table_3 SELECT raw_data FROM data_raw WHERE indicator = 3;
CREATE TABLE table_4 SELECT raw_data FROM data_raw WHERE indicator = 4;

What I need to do, but am unsure how, is to also add an "id" column for each group where the indicator starts with 1. We will be getting weekly updates so I need to specify the initial id for each batch. So if this batch starts at id = 225, then I want to get the following tables from the sample data:

table_1
id  | raw_data
--------------------
225 | data01
226 | data05
227 | data06

table_2
id  | raw_data
--------------------
225 | data02

table_3
id  | raw_data
--------------------
227 | data07

table_4
id  | raw_data
--------------------
225 | data03
225 | data04

Upvotes: 3

Views: 1415

Answers (3)

Derek
Derek

Reputation: 355

Thank you for the input.

Using Igor's suggestion I have come up with the following solution:

CREATE TABLE data_raw (
    raw_data TEXT
);

COPY data_raw FROM 'C:\path\data_x.txt' ...; -- Repeated for each file

ALTER TABLE data_raw
    ADD COLUMN pk_id serial,
    ADD COLUMN id integer,
    ADD COLUMN indicator integer;

UPDATE data_raw SET
    indicator = CAST(substr(raw_data, 1, 1) AS integer),
    raw_data = substr(raw_data, 2);

CREATE TABLE id_base AS
SELECT
    pk_id,
    sum(CASE WHEN indicator = 1 THEN 1 ELSE 0 END) OVER (ORDER BY pk_id) AS rec_id
FROM data_raw;

CREATE INDEX id_base_pk ON id_base USING btree(pk_id);

UPDATE data_raw r SET
    id = (SELECT rec_id FROM id_base b WHERE b.pk_id = r.pk_id);

DROP TABLE id_base;

Upvotes: 0

Andrzej Reduta
Andrzej Reduta

Reputation: 767

1) add PK,

2) Create dictionary table for type...

CREATE TABLE "public"."data_raw"(
 "id" Serial  NOT NULL,
 "id_type" Integer  NOT NULL,
 "raw_data" Text
);


CREATE TABLE "public"."data_raw_type"(
 "id" Serial  NOT NULL,
 "name" Character varying(30));

Upvotes: 0

Ihor Romanchenko
Ihor Romanchenko

Reputation: 28651

Try something like this to generate id for every data group:

SELECT sum(case when indicator = 1 then 1 else 0 end ) over(order by /*something to define the order*/) as id_base
from data_raw

It will generate an id_base for every data group. If you need to start from some particular id - just add this id to the id_base.

Upvotes: 1

Related Questions