Reputation: 1307
How to merge (upsert & delete orphan rows) to tableA
?
tableA
:
+---------+--------+----------+-------+
| company | option | category | rates |
+---------+--------+----------+-------+
| a | f | null | 2.5 |
+---------+--------+----------+-------+
| a | f | d | 2 | *
+---------+--------+----------+-------+
| a | g | e | 3 | **
+---------+--------+----------+-------+
| c | g | e | 4 |
+---------+--------+----------+-------+
| d | f | d | 1 |
+---------+--------+----------+-------+
*
denotes orphan row*.
**
denotes value to change (3 -> 4).
Only touch companies existing in tableB
(a
& c
in the example, leave d
alone).
tableB
:
+---------+--------+----------+-------+
| company | option | category | rates |
+---------+--------+----------+-------+
| a | f | null | 2.5 |
+---------+--------+----------+-------+
| a | g | e | 4 |
+---------+--------+----------+-------+
| c | g | e | 4 |
+---------+--------+----------+-------+
There is a unique index on (company, option, category)
in both tables.
Desired resulting tableA
:
+---------+--------+----------+-------+
| company | option | category | rates |
+---------+--------+----------+-------+
| a | f | null | 2.5 |
+---------+--------+----------+-------+
| a | g | e | 4 | <-
+---------+--------+----------+-------+
| c | g | e | 4 |
+---------+--------+----------+-------+
| d | f | d | 1 |
+---------+--------+----------+-------+
Only the second row (a,f,d,2)
was deleted and rates
was changed from 3 to 4 for (a,g,e)
.
Here is a fiddle: https://rextester.com/QUVC30763
I'm thinking to first delete the orphan row with this:
DELETE from tableA
USING tableB
WHERE
-- ignore rows with IDs that don't exist in tableB
tableA.company = tableB.company
-- ignore rows that have an exact all-column match in tableB
AND NOT EXISTS
(select * from tableB
where tableB.company is not distinct from tableA.company
AND tableB.option is not distinct from tableA.option
AND tableB.category is not distinct from tableA.category );
Then upsert with this:
INSERT INTO tableA (company, option, category, rates)
SELECT company, option, category, rates
FROM tableB
ON CONFLICT (company, option, category)
DO update
set rates= EXCLUDED.rates
WHERE
tableA.rates IS DISTINCT FROM
EXCLUDED.rates;
But the problem with the upsert function is that it can't handle nullable fields. I have to set -1
in place of null
or else the function won't be able to know if there are duplicates or not. I feel like setting -1
in place of null
will create many workarounds in the future, so I'd like to avoid that if I can.
Note: I found that INSERT ... ON CONFLICT ... DO UPDATE
is probably the way to go:
But I haven't seen a query suitable for my case. And I'm not sure if it's possible with nullable fields. Hence the question:
Is there a clean way to merge with nullable fields?
Upvotes: 1
Views: 1018
Reputation: 658242
I think you are on the right path. But there is a design problem with NULL
vs. UNIQUE
:
The columns option
and category
can be NULL
. And NULL
is to be considered equal in those cases. Your current unique indexes do not consider NULL
values to be equal, hence do not enforce your requirements. This creates ambiguities even before you start to merge. The NULL
value is no good for what you are trying to implement. Working around this will create a lot more work and additional points of failure. Consider using a special value instead of NULL
and everything falls into place. You were considering -1
. Anything that naturally makes sense for your actual data type and the nature of the attribute.
That said, the DELETE
has an additional, subtly hidden problem: it would try to delete orphan rows as many times as there are matches on company
in tableB
. Nothing breaks since excess attempts do nothing, but it's needlessly expensive. Use EXISTS
twice instead:
DELETE FROM tableA a
WHERE EXISTS (
SELECT FROM tableB b
WHERE a.company = b.company
)
AND NOT EXISTS (
SELECT FROM tableB b
WHERE (a.company, a.option, a.category) IS NOT DISTINCT FROM
(b.company, b.option, b.category)
);
If you insist on working with NULL
values, splitting the UPSERT into UPDATE
, followed by INSERT ... ON CONFLICT DO NOTHING
would be the workaround. Simpler and cheaper if you don't have concurrent writes to the table. ON CONFLICT DO NOTHING
works without specifying a conflict target, so you could implement your requirements with several partial indexes and get this to work. The manual:
For
ON CONFLICT DO NOTHING
, it is optional to specify aconflict_target
; when omitted, conflicts with all usable constraints (and unique indexes) are handled. ForON CONFLICT DO UPDATE
, aconflict_target
must be provided.
But if you fix your schema with a working UNIQUE
index or constraint, the UPSERT you already have serves nicely.
And make sure there are no concurrent writes to the table or you may face race conditions and / or deadlocks unless you do more ...
Upvotes: 1