Reputation: 3153
I am trying to do an Insert/Select but I am getting a duplicate key error.
INSERT INTO dbo.DESTINATION_TABLE
(
DocNumber,
TenantId,
UserId,
OtherField
)
SELECT
po.DocNumber,
po.TenantId,
po.CreatedById,
po.OtherField
FROM dbo.SOURCE_TABLE po
WHERE
po.DeletedById IS NULL AND
NOT EXISTS(
SELECT * FROM dbo.DESTINATION_TABLE poa
WHERE
poa.DocNumber = po.DocNumber AND
poa.TenantId = po.TenantId
)
DESTINATION_TABLE has a composite primary key of DocNumber and TenantId. DESTINATION_TABLE is empty at the time of running.
SOURCE_TABLE has a primary key of 'SourceTableId'.
But I keep getting an error
Violation of PRIMARY KEY constraint 'PK_dbo.DESTINATION_TABLE'. Cannot insert duplicate key in object 'dbo.DESTINATION_TABLE'. The duplicate key value is (DOC-99, some-tenant).
I have also tried
MERGE INTO DESTINATION_TABLE poa
USING (
SELECT
x.DocNumber,
x.TenantId,
x.CreatedById,
x.OtherField
FROM dbo.SOURCE_TABLE x
WHERE x.DeletedById IS NULL
) po
ON (poa.TenantId = po.TenantId AND poa.DocNumber = po.DocNumber)
WHEN NOT MATCHED THEN INSERT (DocNumber, TenantId, UserId, OtherField)
VALUES(po.DocNumber, po.TenantId, po.CreatedById, po.OtherField);
But I get the exact same result.
How is this happening? Is it because it is checking for 'NOT EXISTS' before running the insert? How do I fix this?
Upvotes: 0
Views: 987
Reputation: 95544
It seems that the flaw here is your understanding of how SQL works. SQL is a set-based language, so it works with the data in sets. For the above, this means that you define the data you want to INSERT
with your SELECT
and then all the rows that you define are INSERT
ed. For your EXISTS
this means that it only checks against rows that exist in the table prior to any of the rows being inserted, it does not insert each row one at a time and validate the EXISTS
prior to each row.
Let's take a very basic example with a single column table:
CREATE TABLE dbo.SomeTable (ID int CONSTRAINT PK_SomeTable PRIMARY KEY);
INSERT INTO dbo.SomeTable (ID)
VALUES(1),(3);
Now, let's say we want to insert the following dataset:
ID |
---|
1 |
2 |
2 |
4 |
If you performed the following query, this will generate the same error you had:
INSERT INTO dbo.SomeTable (ID)
SELECT V.ID
FROM (VALUES(1),(2),(2),(4))V(ID)
WHERE NOT EXISTS (SELECT 1
FROM dbo.SomeTable ST
WHERE ST.ID = V.ID);
Violation of PRIMARY KEY constraint 'PK_SomeTable'. Cannot insert duplicate key in object 'dbo.SomeTable'. The duplicate key value is (2).
This is because there are two rows with the value 2
for ID
that are trying to be INSERT
ed. Both rows are trying to be INSERT
ed because there are no rows in the table with the value 2
at the time the INSERT
occurs. If you want, you can validate what rows were tried to be INSERT
ed by commenting out the INSERT
clause:
--INSERT INTO dbo.SomeTable (ID)
SELECT V.ID
FROM (VALUES(1),(2),(2),(4))V(ID)
WHERE NOT EXISTS (SELECT 1
FROM dbo.SomeTable ST
WHERE ST.ID = V.ID);
ID |
---|
2 |
2 |
4 |
For your scenario, one method would be to use a CTE with ROW_NUMBER
to limit the results to a single row for a single primary key value.
WITH CTE AS
(SELECT po.DocNumber,
po.TenantId,
po.CreatedById,
po.OtherField,
ROW_NUMBER() OVER (PARTITION BY po.DocNumber, po.TenantId ORDER BY <Column(s) to ORDER BY to determine desired row>) AS RN
FROM dbo.SOURCE_TABLE po
WHERE po.DeletedById IS NULL
AND NOT EXISTS (SELECT *
FROM dbo.DESTINATION_TABLE poa
WHERE poa.DocNumber = po.DocNumber
AND poa.TenantId = po.TenantId))
INSERT INTO dbo.DESTINATION_TABLE (DocNumber,
TenantId,
UserId,
OtherField)
SELECT DocNumber,
TenantId,
CreatedById,
OtherField
FROM CTE
WHERE RN = 1;
Upvotes: 2
Reputation: 70513
Run the following query
SELECT po.DocNumber, po.TenantId, COUNT(*) as CNT
FROM dbo.SOURCE_TABLE po
GROUP BY po.DocNumber, po.TenantId
HAVING COUNT(*) > 1
It will show you the rows that are having the issue -- there is more than one of them in the source table.
You then need to figure out what business rule you want to use for the values of items with duplicate rows (x.CreatedById, x.OtherField), maybe you want the first one the max one etc. Then re-write your select query with a group by to only retrieve one row per key.
Upvotes: 0