reptilicus
reptilicus

Reputation: 10397

SQL Full outer join or alternative solution

I'm trying to join multiple tables together using a full outer join, its getting close to the proper result but there are some duplicate rows due to the join clauses. I've got several tables with columns of id, date, value. I'm looking to get a table with one row for each id, date pair that has all the values from each of the tables.

Here is a SQLFiddle if you want to play with it.

Here's what I've got so far:


SELECT
  COALESCE(T1.ID, T2.ID, T3.ID, t4.id) AS ID,
  COALESCE(T1.event_dt, T2.event_dt, T3.event_dt, t4.event_dt) AS DATE,
  T1.AMT1, T2.AMT2, T3.AMT3, t4.AMT4
FROM T1
FULL OUTER JOIN T2
ON
  T2.id = T1.id
  AND T2.event_dt = T1.event_dt
FULL OUTER JOIN T3
ON
  T3.id = T1.id
  AND T3.event_dt = T1.event_dt
FULL OUTER JOIN T4
ON
  T4.id = T1.id
  AND T4.event_dt = T1.event_dt
ORDER BY ID, DATE

This almost works, but I get some duplicate rows when for instance T4 has an ID,event_dt pair that is not in T1 (as to be expected because thats what I'm joining on). For instance, I'll get something like:


1   April, 06 2012 00:00:00+0000    (null)  2   (null)  (null)
1   April, 06 2012 00:00:00+0000    (null)  (null)  (null)  4
1   April, 06 2012 00:00:00+0000    (null)  (null)  3   (null)

When I'm looking to get:

1   April, 06 2012 00:00:00+0000    (null)   2   3   4

Is there a way to flatten/merge those rows together, or is there a better way to go about this altogether?

Upvotes: 1

Views: 5892

Answers (4)

wildplasser
wildplasser

Reputation: 44250

(assuming the OP wants a fully symmetric outer 4-join)

WITH four AS (
        SELECT id, event_dt FROM t1
        UNION
        SELECT id, event_dt FROM t2
        UNION
        SELECT id, event_dt FROM t3
        UNION
        SELECT id, event_dt FROM t4
        )
SELECT f.id, f.event_dt
        , t1.amt1
        , t2.amt2
        , t3.amt3
        , t4.amt4
FROM four f
LEFT JOIN t1 ON t1.id = f.id AND t1.event_dt = f.event_dt
LEFT JOIN t2 ON t2.id = f.id AND t2.event_dt = f.event_dt
LEFT JOIN t3 ON t3.id = f.id AND t3.event_dt = f.event_dt
LEFT JOIN t4 ON t4.id = f.id AND t4.event_dt = f.event_dt
ORDER BY id, event_dt
        ;

Result:

 id |  event_dt  | amt1 | amt2 | amt3 | amt4 
----+------------+------+------+------+------
  1 | 2012-04-01 |    1 |      |      |     
  1 | 2012-04-02 |    1 |      |    3 |     
  1 | 2012-04-03 |    1 |      |    3 |     
  1 | 2012-04-06 |      |    2 |    3 |    4
  1 | 2012-04-07 |      |    2 |      |     
  2 | 2012-04-01 |   40 |      |      |     
  2 | 2012-04-02 |      |      |    3 |     
  2 | 2012-04-03 |      |      |    3 |     
  2 | 2012-04-04 |   40 |      |      |     
(9 rows)

BTW: after the UNION four, LEFT JOINs will do the same as FULL JOINs here (union four already has all the possible {id, event_dt} pairs)

Upvotes: 1

Taryn
Taryn

Reputation: 247760

You could always use an aggregate around the amount columns:

SELECT
  COALESCE(T1.ID, T2.ID, T3.ID, t4.id) AS ID,
  COALESCE(T1.event_dt, T2.event_dt, T3.event_dt, t4.event_dt) AS DATE,
  max(coalesce(T1.AMT1, 0)) AMT1,  -- use coalesce to replace the null with zero
  max(coalesce(T2.AMT2, 0)) AMT2, 
  max(coalesce(T3.AMT3, 0)) AMT3, 
  max(coalesce(t4.AMT4, 0)) AMT4
FROM T1
FULL OUTER JOIN T2
  ON T2.id = T1.id
  AND T2.event_dt = T1.event_dt
FULL OUTER JOIN T3
  ON T3.id = T1.id
  AND T3.event_dt = T1.event_dt
FULL OUTER JOIN T4
  ON T4.id = T1.id
  AND T4.event_dt = T1.event_dt
group by  COALESCE(T1.ID, T2.ID, T3.ID, t4.id), 
  COALESCE(T1.event_dt, T2.event_dt, T3.event_dt, t4.event_dt)
ORDER BY ID, DATE;

See Demo

Upvotes: 3

A.H.
A.H.

Reputation: 66273

I think you join-citeria is simply not what you really want. This one should do the trick:

SELECT
  COALESCE(T1.ID, T2.ID, T3.ID, t4.id) AS ID,
  COALESCE(T1.event_dt, T2.event_dt, T3.event_dt, t4.event_dt) AS DATE,
  T1.AMT1, T2.AMT2, T3.AMT3, t4.AMT4
FROM T1
FULL OUTER JOIN T2
ON
  T2.id = T1.id
  AND T2.event_dt = T1.event_dt
FULL OUTER JOIN T3
ON
  T3.id = coalesce(T1.id, T2.id)
  AND T3.event_dt = coalesce(T1.event_dt, T2.event_dt)
FULL OUTER JOIN T4
ON
  T4.id = coalesce(T1.id, T2.id, T3.id)
  AND T4.event_dt = coalesce(T1.event_dt, T2.event_dt, T3.event_dt)
ORDER BY ID, DATE

SQL-Fiddle here gives you the desired output for 2012-04-06.

Upvotes: 3

Declan_K
Declan_K

Reputation: 6826

Trap for NULLs, repalceing them with zeros, then find the MAX value in each column.

SELECT
  COALESCE(T1.ID, T2.ID, T3.ID, t4.id) AS ID,
  COALESCE(T1.event_dt, T2.event_dt, T3.event_dt, t4.event_dt) AS DATE,
  max( coalesce(T1.AMT1,0)) as amt1
, max( coalesce(T2.AMT2,0)) as amt2
, max( coalesce(T3.AMT3,0)) as amt3
, max( coalesce(t4.AMT4,0)) as amt4
FROM T1
FULL OUTER JOIN T2
ON
  T2.id = T1.id
  AND T2.event_dt = T1.event_dt
FULL OUTER JOIN T3
ON
  T3.id = T1.id
  AND T3.event_dt = T1.event_dt
FULL OUTER JOIN T4
ON
  T4.id = T1.id
  AND T4.event_dt = T1.event_dt
group by   COALESCE(T1.ID, T2.ID, T3.ID, t4.id),
  COALESCE(T1.event_dt, T2.event_dt, T3.event_dt, t4.event_dt)
ORDER BY ID, DATE

Here's the Fiddle

Upvotes: 2

Related Questions