Reputation: 333

Postgres LEFT JOIN and data manipulation

I've got a boring issue to solve (hope it is hard only for me haha), as follows:

I have a PostgreSQL database with many tables. These tables are updated daily by a Perl Script. The table that interests for my problem follows the pattern below:

   ID  |  Central  |     ts     |  Country   |  Name   | Column3 | Column4 | Column5 |
------------------------------------------------------------------------------------------

There isn't a unique column primary key that identify the rows unicaly... Instead, I can see a BTree in the Perl script built with "ID-Central-ts" acting as a PK. "ts" is a timestamp generated by the script, and there are always 3 ts in the DB, so it stores every "central-ID" row for the past 3 days.

So, what I want: Letting go "Country" and "Name" columns (these columns may differ even in the same ID-central-ts without problems, or even repeat themselves), one "ID-Central-ts" shouldn't have different column' values from those shown in a specific central. I need a Query that shows me these values that mismatch from the right central, for the LAST timestamp added (The biggest number).

I mean: If, for ID 01, the "default-central" says that values for "column3", "column4" and "column5" need to be a string with 'right' in the last "ts", any value different should be caught.

Example:

Assume that Central 'Alfa' is the "default Central". It stores values that need to be equal to every single "ID" in this or any other Central, for that given ID.

   ID  |  Central  |     ts     | Country  |  Name    | Column3 | Column4 | Column5 |
------------------------------------------------------------------------------------------
   01  |   Alfa    |  10000001  |   USA    |  Fairy   |  right  |  right  |  right  |
   01  |   Alfa    |  10000002  |   USA    |  Minish  |  right  |  right  |  right  | 
   01  |   Alfa    |  10000003  |   USA    |  Elf     |  right  |  right  |  right  | 

   01  |   Delta   |  10000001  |   USA    |  Goron   |  right  |  right  |  right  | 
   01  |   Delta   |  10000002  |   USA    |  Elf     |  right  |  wrong  |  right  | 
   01  |   Delta   |  10000003  |   USA    |  Acqua   |  wrong  |  right  |  right  |
.
.
.
   02  |   Alfa    |  10000001  |   BRA    |  Fairy   |  RIGHT  |  RIGHT  |  RIGHT  |
   02  |   Alfa    |  10000002  |   BRA    |  Minish  |  RIGHT  |  RIGHT  |  RIGHT  | 
   02  |   Alfa    |  10000003  |   BRA    |  Elf     |  RIGHT  |  RIGHT  |  RIGHT  | 

   02  |   Delta   |  10000001  |   BRA    |  Goron   |  WRONG  |  RIGHT  |  RIGHT  | 
   02  |   Delta   |  10000002  |   BRA    |  Elf     |  RIGHT  |  WRONG  |  RIGHT  | 
   02  |   Delta   |  10000003  |   BRA    |  Acqua   |  WRONG  |  RIGHT  |  (null) |

I need to get:

   ID  |  Central  |     ts     | Country  |  Name    | Column3 | Column4 |   Column5     |
-------------------------------------------------------------------------------------------
   01  |   Delta   |  10000003  |   USA    |  Acqua   |  wrong  |         |               |
   02  |   Delta   |  10000003  |   BRA    |  Acqua   |  WRONG  |         |  "Wrong null" |

See that even when ts 10000001 or 10000002 have wrong values, they're not taken in. Also notice that when there are nulls where should existe some value, i need to write something to show that this null shouldn't exist.

Can anyone please take a look? I've managed to create a view to get the values from central Alfa, but I cant' figure a LEFT JOIN or ways to create these rules of writing the "wrong null" thing or how to disconsider the lower ts's.

Any help will be highly appreciated.

Upvotes: 0

Answers (3)

ZeldaElf

Reputation: 333

I've came to an answer with a LEFT JOIN, which includes every case I wanted.

Thanks a lot for every piece of advice, and sorry for not accepting any of the answers that came before mine... Maybe I have not been fully clear, but my answer brings up the exact response to my problem.

I won't change the query to fit the column names I used as example before, as I'm afraid I could make it wrong. Instead of creating lots of ANDs, I decided to get every difference column by column and join all of them of them in a FULL OUTER JOIN, later.

Follows my first query, wich gets values that differ from a given central.

SELECT Test_Configs.central, Test_Configs.imsi,
       CASE Test_Configs.mapver WHEN '' THEN '-'
           ELSE COALESCE(Test_Configs.mapver, '-') 
       END 

FROM config_imsis_centrais AS Default_Configs              -- Valores padrão da central correta
    LEFT JOIN config_imsis_centrais AS Test_Configs        -- Valores das centrais a serem testadas
        ON Default_Configs.central = 'CENTRAL_USED_AS_EXAMPLE'

        AND Default_Configs.ts = (SELECT MAX(ts) FROM config_imsis_centrais)
        AND Default_Configs.imsi = Test_Configs.imsi
        AND Default_Configs.ts = Test_Configs.ts
        AND Test_Configs.central <> Default_Configs.central
WHERE (                                                    -- Análise:
            COALESCE(Default_Configs.mapver, 'null') <> COALESCE(Test_Configs.mapver, 'null') AND

            Test_Configs.central <> ''
       )

My FULL OUTER Join is made by joining every potencial table, using central and "imsi", (that works like the ID in my example). Follows:

SELECT central, imsi, mapver, camel, nrrg
    FROM
        vw_erros_mgisp_mapver 
        FULL OUTER JOIN
        vw_erros_mgisp_camel USING (central, imsi)
        FULL OUTER JOIN
        vw_erros_mgisp_nrrg USING (central, imsi)
    ORDER BY central, imsi

That's it. Thanks a lot to everybody, and sorry not accepting your hard working answers, I just think it would be nicier for someone with the same problem to study a better solution.

Cheers!

Upvotes: 0

Robert N

Reputation: 1174

The way I would go about this would be with a self-join:

SELECT t.*
FROM theTable AS m -- values from the "master" central
  INNER JOIN theTable AS t -- values from the central to test
     ON m.Central = 'ALFA'
    AND m.ts = (SELECT MAX(ts) FROM theTable)
    AND m.ID = t.ID
    AND m.ts = t.ts
    AND t.Central <> m.Central
    AND (
      -- we assume that values in the "master" central cannot be null or blank
      m.Column3 <> coalesce(t.Column3, '') OR 
      m.Column4 <> coalesce(t.Column4, '') OR 
      m.Column5 <> coalesce(t.Column5, '')
    )

In a situation like this, you could also use CTEs, which some people find more readable:

WITH MaxTimestamp AS (
  SELECT MAX(tx) value FROM theTable
),
MasterValues AS (
  SELECT * FROM theTable WHERE Central = 'ALFA' AND ts = (SELECT value FROM MaxTimestamp)
), 
TestValues AS (
  SELECT * FROM theTable WHERE Central <> 'ALFA' AND ts = (SELECT value FROM MaxTimestamp)
)
SELECT t.*
FROM MasterValues m
  INNER JOIN TestValues t
     ON m.ID = t.ID
    AND (
      -- we assume that values in the "master" central cannot be null or blank
      m.Column3 <> coalesce(t.Column3, '') OR 
      m.Column4 <> coalesce(t.Column4, '') OR 
      m.Column5 <> coalesce(t.Column5, '')
    )

In either case, you could also write the whole thing as a function or anonymous block, which would allow you to specify the value of the master central as a parameter or variable, in case that is not a fixed value.

Upvotes: 1

CynicalSection

Reputation: 704

It should be something like

select 
ID
,Central
,ts
,Country
,name
,COALESCE(column3, 'wrong') AS Column3
,COALESCE(column4, 'wrong') AS Column4
,COALESCE(column5, 'wrong') AS Column5
FROM T1
WHERE 
(UPPER(Column3) <> 'RIGHT' OR UPPER(Column4) <> 'RIGHT' OR UPPER(Column5) <> 'RIGHT')
and ts = (SELECT MAX(ts) FROM T1)

Upvotes: 0

Postgres LEFT JOIN and data manipulation

Answers (3)

Related Questions