Reputation: 93
Let's say we are joining two tables (A and B) by some condition using left or inner join
WITH
a(id, x) AS (VALUES
(0, 'a'),
(1, 'a'),
(2, 'b')
),
b(id, y) AS (VALUES
(10, 'a'),
(20, 'a'),
(30, 'b')
),
pairs AS (
SELECT a.id AS a_id, b.id AS b_id
FROM a LEFT JOIN b ON a.x=b.y
)
SELECT * FROM pairs; -- how to modify this query to return the expected result?
The result will be 5 rows (2 * 2 + 1 * 1) Now the hard part: id from each table can appear in the result only once.
a_id|b_id|
----|----|
0| 10|
->0| 20| -- a_id=0 can be picked only once
1|->10| -- b_id=10 can be picked only once
1| 20|
2| 30|
-- so the expected result is:
a_id|b_id|
----|----|
0| 10|
1| 20|
2| 30|
-- UPDATE: alternative result could be:
a_id|b_id|
----|----|
0| 20|
1| 10|
2| 30|
Upvotes: 0
Views: 127
Reputation: 23666
After joining the tables is quite hard, I found no fast solution. I guess, that's a combinatorical problem, because the result depends on the order in which you process the data. But if the data changes, the result is completely different... Maybe someone proofs me wrong...
However, if we are able to modify the tables BEFORE joining, it would be much better:
WITH
a(id, x) AS (VALUES
(0, 'a'),
(1, 'a'),
(2, 'b')
),
b(id, y) AS (VALUES
(10, 'a'),
(20, 'a'),
(30, 'b')
),
pairs AS (
SELECT a.id AS a_id, b.id AS b_id
FROM (
SELECT
*,
row_number() OVER (PARTITION BY x)
FROM
a
) a
LEFT JOIN (
SELECT
*,
row_number() OVER (PARTITION BY y)
FROM
b
) b ON a.x=b.y AND a.row_number = b.row_number
)
SELECT
*
FROM pairs
Now we can add a row count. So we can merge on the data AND the row count. This ensure that every data is merged only once.
Upvotes: 1