Reputation: 13
in Amazon Redshift I try to do a bulk insert value in a table from a temp table. However I only want to insert the values where a compound of values (primary key) not exist in the table, to avoid adding duplicate.
Below the DDL of the table
• clusters_typologies table (table when i want to insert data)
create table if not exists clusters.clusters_typologies
(
cluster_id BIGINT,
typology_id BIGINT,
semantic_id BIGINT,
primary key (cluster_id, typology_id, semantic_id)
);
Temp Table is create with query below and after that all field are correctly inserted.
CREATE TEMPORARY TABLE temporary (
cluster_id bigint,
typology_name varchar(100),
typology_id bigint,
semantic_name varchar(100),
semantic_id bigint
);
Now when i try to insert with that query
INSERT INTO clusters.clusters_typologies (cluster_id, typology_id,semantic_id)
(SELECT temp.cluster_id, temp.typology_id, temp.semantic_id
FROM temporary temp
WHERE NOT EXISTS(SELECT 1
FROM clusters_typologies
where cluster_id = temp.cluster_id
and typology_id = temp.typology_id
and semantic_id = temp.semantic_id));
I got this error and i cannot figured out how to make it work.
Invalid operation: This type of correlated subquery pattern is not supported due to internal error;
Anyone know how to fix or how is the best way to insert in a table with a compound key avoiding duplicate.
Thanks.
Upvotes: 1
Views: 668
Reputation: 13
After some attempt I figured out how to do an insert from a temp table, and check from a compound primary key to avoid duplicate.
Basically from AWS documentation that @Jon Scott as sent, I understand that use outer table in inner select is not supported from Redshift.
I solve using a left join and check if the joining column is null.
Below the query I use now.
INSERT INTO clusters.clusters_typologies (cluster_id, typology_id, semantic_id)
(SELECT temp.cluster_id, temp.typology_id, temp.semantic_id
FROM aaaa temp
LEFT JOIN clusters.clusters_typologies clu_typ ON temp.cluster_id = clu_typ.cluster_id AND
temp.typology_id = clu_typ.typology_id AND
temp.semantic_id = clu_typ.semantic_id
WHERE clu_typ.cluster_id IS NULL
AND clu_typ.typology_id IS NULL
AND clu_typ.semantic_id IS NULL);
Upvotes: 0
Reputation: 4354
To upsert follow this guide https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-upsert.html
and note that certain types of correlated subquery are not allowed in redshift - that is the cause of your error see https://docs.aws.amazon.com/redshift/latest/dg/r_correlated_subqueries.html
Upvotes: 0