lammy
lammy

Reputation: 467

SQL Outer Join on 3 Tables in SQL

I can't seem to wrap my head around this.

The data is 3 tables (Prognosis A/B/C) with 3 matching columns - Timestamp, Value, Country. Some rows can be missing in any of these tables. I want to do a full outer join on all three tables based on the column Timestamp. The data looks as follow:

+-----------------------------+-------+---------+
| Timestamp                   | Value | Country |
+-----------------------------+-------+---------+
| 2020-05-05 21:00:00.0000000 | 19,02 | FR      |
+-----------------------------+-------+---------+
| 2020-05-05 20:00:00.0000000 | 17,85 | FR      |
+-----------------------------+-------+---------+
| 2020-05-05 19:00:00.0000000 | 17,71 | FR      |
+-----------------------------+-------+---------+
            ... omitted for brevity ...
+-----------------------------+-------+---------+
| 2020-05-05 01:00:00.0000000 | 3,33  | FR      |
+-----------------------------+-------+---------+
| 2020-05-05 00:00:00.0000000 | 5,89  | FR      |
+-----------------------------+-------+---------+
| 2020-05-04 23:00:00.0000000 | 7,62  | FR      |
+-----------------------------+-------+---------+
| 2020-05-04 22:00:00.0000000 | 11,79 | FR      |
+-----------------------------+-------+---------+

The expected outcome is supposed to look like this. Notice the null Value if no corresponding row has been found in a table.

+-----------------------------+--------+-------+-------+
| Timestamp                   | ValueA | ValueB| ValueC|
+-----------------------------+--------+-------+-------+
| 2020-05-05 21:00:00.0000000 | -28,78 | 38,89 | 80    |
+-----------------------------+--------+-------+-------+
| 2020-05-05 20:00:00.0000000 | 23,78  | 19,02 | 120   |
+-----------------------------+--------+-------+-------+
| 2020-05-05 19:00:00.0000000 | -13,78 | 32,36 | 71    |
+-----------------------------+--------+-------+-------+
                    ... omited for brevity ...
+-----------------------------+--------+-------+-------+
| 2020-05-04 18:00:00.0000000 | 18,65  | 31,24 | 51    |
+-----------------------------+--------+-------+-------+
| 2020-05-04 17:00:00.0000000 | 11,51  | null  | 14     |
+-----------------------------+--------+-------+-------+
| 2020-05-04 16:00:00.0000000 | 1,51   | 34,07  | 23    |
+-----------------------------+--------+-------+-------+
| 2020-05-04 15:00:00.0000000 | 21,51  | null  | null  |
+-----------------------------+--------+-------+-------+

Here is what I tried:

Solution A:

SELECT  A.Timestamp,
        A.Value, 
        B.Value,
        C.Value
FROM(
    (SELECT Timestamp, Value, Country FROM [dbo].PrognosisA) A
    FULL OUTER JOIN 
    (SELECT Timestamp, Value, Country FROM [dbo].PrognosisB) B on A.Timestamp = B.Timestamp
    FULL OUTER JOIN 
    (SELECT Timestamp, Value, Country FROM [dbo].PrognosisC) C on ISNULL(A.Timestamp, B.Timestamp) = C.Timestamp
)
WHERE  A.Country = 'FR' AND B.Country = 'FR' AND C.Country = 'FR'
ORDER BY A.Timestamp DESC

However, this solution returns duplicated timestamps.

Solution B:

select allTimestamp.Timestamp, [dbo].PrognosisB.Value, [dbo].PrognosisA.Value, [dbo].PrognosisC.Value
from (select Timestamp, Country from [dbo].PrognosisB union 
      select Timestamp, Country from [dbo].PrognosisA union
      select Timestamp, Country from [dbo].PrognosisC
     ) allTimestamp left outer join 
     [dbo].PrognosisB on allTimestamp.Timestamp = [dbo].PrognosisB.Timestamp left outer join
     [dbo].PrognosisA on allTimestamp.Timestamp = [dbo].PrognosisA.Timestamp left outer join
     [dbo].PrognosisC on allTimestamp.Timestamp = [dbo].PrognosisC.Timestamp
WHERE allTimestamp.Country = 'FR'
ORDER BY allTimestamp.Timestamp DESC;

This solution returns even more duplicates.

So where am I going wrong?

Upvotes: 0

Views: 78

Answers (3)

Luis LL
Luis LL

Reputation: 2993

As long as you have only one row per TimeStamp per country, you can try something like this.

;WITH TimeStamps AS (
           SELECT Timestamp, County FROM [dbo].PrognosisA  
     union SELECT Timestamp, County FROM [dbo].PrognosisB
     union SELECT Timestamp, County FROM [dbo].PrognosisC
 )

SELECT TimeStamps.Timestamp,
       A.Value,
       B.Value,
       C.Value
FROM   TimeStamps
       LEFT JOIN [dbo].PrognosisA A
            ON  TimeStamps.TimeStamp = A.Timestamp AND TimeStamps.Country = A.Country
       LEFT JOIN [dbo].PrognosisB B
            ON  TimeStamps.TimeStamp = B.Timestamp AND TimeStamps.Country = B.Country
       LEFT JOIN [dbo].PrognosisC C
            ON  TimeStamps.TimeStamp = C.Timestamp AND TimeStamps.Country = C.Country
WHERE  TimeStamps.Country = 'FR'
ORDER BY
       TimeStamps.Timestamp DESC

If you have 2 or more [TimeStamp, Country] you'll get more and more duplicates.

Upvotes: 0

Philip Kelley
Philip Kelley

Reputation: 40359

I set up test tables, ran your queries, and built my own. My conclusions: Your first query is a bit problematic, your second is valid, both are perhaps a bit over-convoluted but functional, and the comments are right: somewhere, somehow, your data is not what you think it is.

I set up testing data as follows, using INT instead of datetime for clarity:

/*

DROP TABLE PrognosisA
DROP TABLE PrognosisB
DROP TABLE PrognosisC

*/

CREATE TABLE PrognosisA
 (
   Timestamp  int            not null  primary key
  ,Value      decimal(10,2)  not null
  ,Country    char(2)        not null
 )

CREATE TABLE PrognosisB
 (
   Timestamp  int            not null  primary key
  ,Value      decimal(10,2)  not null
  ,Country    char(2)        not null
 )

CREATE TABLE PrognosisC
 (
   Timestamp  int            not null  primary key
  ,Value      decimal(10,2)  not null
  ,Country    char(2)        not null
 )

I then set up data that covers all possible combinations of “ID in one, some, or all tables”:

INSERT PrognosisA (Timestamp, Value, Country) values
  (1, 1.11, 'FR')
 ,(2, 2.21, 'FR')
 ,(3, 3.31, 'FR')
 ,(5, 5.51, 'FR')

INSERT PrognosisB (Timestamp, Value, Country) values
  (1, 1.12, 'FR')
 ,(2, 2.22, 'FR')
 ,(4, 4.42, 'FR')
 ,(6, 6.62, 'FR')

INSERT PrognosisC (Timestamp, Value, Country) values
  (1, 1.13, 'FR')
 ,(3, 3.33, 'FR')
 ,(4, 4.43, 'FR')
 ,(7, 7.73, 'FR')

After a few dead-ends (full outer joins are always a challenge), I ended up with the following:

SELECT
   coalesce(pa.Timestamp, pb.Timestamp, pc.Timestamp)   Timestamp
  ,pa.Timestamp
  ,pb.Timestamp
  ,pc.Timestamp
  ,pa.Value
  ,pb.Value
  ,pc.Value
 from PrognosisA  pa
  full outer join PrognosisB  pb
   on pb.Timestamp = pa.Timestamp
  full outer join PrognosisC  pc
   on pc.Timestamp = isnull(pa.Timestamp, pb.TimeStamp)
 where coalesce(pa.Country, pb.Country, pc.Country) = 'FR'
 order by 1  --  Only for testing!

This returns all the data, with all possible permutations of present/not present and no duplicate rows.

Two recommendations:

  • Review your Timstamp data—and it’s datatypes—very carefully. Datetimes, particularly when they go down to Nths of a second, can be very fussy to work with, and would not be my first choice for a join column. Perhaps the precision varies between tables and something is getting truncated?

  • The data as described could be much better normalized. Make it one table, add a column for “WhichPrognosis” (A, B, C, whatever), and go from there. Need to return one "value" columns per prognosis? Either use a pivot statement (awkward), or let the calling application sort out the formatting of data.

Upvotes: 2

avery_larry
avery_larry

Reputation: 2135

I would suggest moving the Country comparison to the join clause like this:

untested

SELECT  isnull(A.Timestamp, isnull(B.Timestamp, C.Timestamp)) Timestamp
   ,A.Value ValueA, B.Value ValueB, C.Value ValueC
FROM ( -- subquery to get only records where Country = 'FR'
   SELECT *
   FROM [dbo].PrognosisA
   WHERE Country = 'FR'  -- limit to records where Country = 'FR'
) A
FULL OUTER JOIN [dbo].PrognosisB) B
   on A.Timestamp = B.Timestamp
   and B.Country = 'FR' -- limit to records where Country = 'FR'
FULL OUTER JOIN [dbo].PrognosisC C
   on ISNULL(A.Timestamp, B.Timestamp) = C.Timestamp
   and C.Country = 'FR' -- limit to records where Country = 'FR'
ORDER BY isnull(A.Timestamp, isnull(B.Timestamp, C.Timestamp)) DESC

If any of the 3 tables has any duplicates of Timestamp (where country = 'fr') then of course your joined table will have duplicates.

If you want to eliminate duplicates, you can use DISTINCT or GROUP BY in a number of different ways.

Upvotes: 0

Related Questions