Matteo Lomi
Matteo Lomi

Reputation: 13

How can I bulk insert rows only if a compound primary key don't already exist? [AWS Redshift]

in Amazon Redshift I try to do a bulk insert value in a table from a temp table. However I only want to insert the values where a compound of values (primary key) not exist in the table, to avoid adding duplicate.

Below the DDL of the table

• clusters_typologies table (table when i want to insert data)

create table if not exists clusters.clusters_typologies
(
    cluster_id  BIGINT,
    typology_id BIGINT,
    semantic_id BIGINT,
    primary key (cluster_id, typology_id, semantic_id)
);

Temp Table is create with query below and after that all field are correctly inserted.

CREATE TEMPORARY TABLE temporary (
  cluster_id   bigint,
  typology_name varchar(100),
  typology_id   bigint,
  semantic_name varchar(100),
  semantic_id   bigint
);

Now when i try to insert with that query

INSERT INTO clusters.clusters_typologies (cluster_id, typology_id,semantic_id)
    (SELECT temp.cluster_id, temp.typology_id, temp.semantic_id
     FROM temporary temp
     WHERE NOT EXISTS(SELECT 1
                      FROM clusters_typologies
                      where cluster_id = temp.cluster_id
                        and typology_id = temp.typology_id
                        and semantic_id = temp.semantic_id));

I got this error and i cannot figured out how to make it work.

Invalid operation: This type of correlated subquery pattern is not supported due to internal error;

Anyone know how to fix or how is the best way to insert in a table with a compound key avoiding duplicate.

Thanks.

Upvotes: 1

Views: 668

Answers (2)

Matteo Lomi
Matteo Lomi

Reputation: 13


After some attempt I figured out how to do an insert from a temp table, and check from a compound primary key to avoid duplicate.

Basically from AWS documentation that @Jon Scott as sent, I understand that use outer table in inner select is not supported from Redshift.

I solve using a left join and check if the joining column is null.
Below the query I use now.

INSERT INTO clusters.clusters_typologies (cluster_id, typology_id, semantic_id)
    (SELECT temp.cluster_id, temp.typology_id, temp.semantic_id
     FROM aaaa temp
            LEFT JOIN clusters.clusters_typologies clu_typ ON temp.cluster_id = clu_typ.cluster_id AND
                                                              temp.typology_id = clu_typ.typology_id AND
                                                              temp.semantic_id = clu_typ.semantic_id
     WHERE clu_typ.cluster_id IS NULL
       AND clu_typ.typology_id IS NULL
       AND clu_typ.semantic_id IS NULL);

Upvotes: 0

Jon Scott
Jon Scott

Reputation: 4354

To upsert follow this guide https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-upsert.html

and note that certain types of correlated subquery are not allowed in redshift - that is the cause of your error see https://docs.aws.amazon.com/redshift/latest/dg/r_correlated_subqueries.html

Upvotes: 0

Related Questions