Habib Mohammad
Habib Mohammad

Reputation: 93

SELECT DISTINCT rows that can occur only once

Let's say we are joining two tables (A and B) by some condition using left or inner join

WITH
a(id, x) AS (VALUES
    (0, 'a'),
    (1, 'a'),
    (2, 'b')
),
b(id, y) AS (VALUES
    (10, 'a'),
    (20, 'a'),
    (30, 'b')
),
pairs AS (
    SELECT a.id AS a_id, b.id AS b_id
    FROM a LEFT JOIN b ON a.x=b.y
)
SELECT * FROM pairs; -- how to modify this query to return the expected result?

The result will be 5 rows (2 * 2 + 1 * 1) Now the hard part: id from each table can appear in the result only once.

a_id|b_id|
----|----|
   0|  10|
 ->0|  20| -- a_id=0 can be picked only once
   1|->10| -- b_id=10 can be picked only once
   1|  20|
   2|  30|

-- so the expected result is:
a_id|b_id|
----|----|
   0|  10|
   1|  20|
   2|  30|

-- UPDATE: alternative result could be:
a_id|b_id|
----|----|
   0|  20|
   1|  10|
   2|  30|

Upvotes: 0

Views: 127

Answers (1)

S-Man
S-Man

Reputation: 23666

demo:db<>fiddle

After joining the tables is quite hard, I found no fast solution. I guess, that's a combinatorical problem, because the result depends on the order in which you process the data. But if the data changes, the result is completely different... Maybe someone proofs me wrong...

However, if we are able to modify the tables BEFORE joining, it would be much better:

WITH
a(id, x) AS (VALUES
    (0, 'a'),
    (1, 'a'),
    (2, 'b')
),
b(id, y) AS (VALUES
    (10, 'a'),
    (20, 'a'),
    (30, 'b')
),
pairs AS (
    SELECT a.id AS a_id, b.id AS b_id
    FROM (
        SELECT
            *,
            row_number() OVER (PARTITION BY x)
        FROM
            a
    ) a
    LEFT JOIN (
        SELECT
            *,
            row_number() OVER (PARTITION BY y)
        FROM
            b
    ) b ON a.x=b.y AND a.row_number = b.row_number
)
SELECT 
    *
FROM pairs

Now we can add a row count. So we can merge on the data AND the row count. This ensure that every data is merged only once.

Upvotes: 1

Related Questions