Richard Walker
Richard Walker

Reputation: 169

Getting distinct highest revision number

Background

I'm struggling to write a query to select a record with the highest revision number for each id across 3 tables. the worksite id is shared across all 3 tables so a query for worksitedraft.id = 103 and worksitesubmission.id = 103 would each return the same worksite in a different state.

tables

All 3 tables worksite, worksiteDraft, worksiteSubmission have the columns:

id, name, reference, majorrevisionnumber, minorrevisionnumber

Essentially I need to union all three tables together then distinct the results but make sure that for each result it is the record with the highest revision number from across the 3 tables.

Current solution

So far I have the following (working) solution but it's pretty nasty and feels like there must be a better way.

Create a view of all worksites across the 3 tables unioned together:

CREATE VIEW allworksites AS
SELECT * FROM 
(
    SELECT id, name, reference, 'worksiteDraft' as type, CONCAT(majorrevisionnumber, '.', minorrevisionnumber) as revisionnumber
    FROM worksitedraft
    UNION
    SELECT id, name, reference, 'worksiteSubmission' as type, CONCAT(majorrevisionnumber, '.', minorrevisionnumber) as revisionnumber
    FROM worksitesubmission
    UNION
    SELECT id, name, reference, 'worksite' as type, CONCAT(majorrevisionnumber, '.', minorrevisionnumber) as revisionnumber
    FROM worksite
) as allworksites;

Then use a nested select to calculate the type via ordering by major/minor revision number and limiting to 1

SELECT * FROM allworksites
WHERE id = 106
AND type = 

(
    SELECT type FROM
    (
        SELECT
            id,
            type,
            revisionnumber
        FROM 
        (
            SELECT 'worksiteDraft' as type, CONCAT(majorrevisionnumber, '.', minorrevisionnumber) as revisionnumber
            FROM worksitedraft
            WHERE id = allworksites.id
            UNION ALL
            SELECT 'worksiteSubmission' as type, CONCAT(majorrevisionnumber, '.', minorrevisionnumber) as revisionnumber
            FROM worksitesubmission
            WHERE id = allworksites.id
            UNION ALL
            SELECT 'worksite' as type, CONCAT(majorrevisionnumber, '.', minorrevisionnumber) as revisionnumber
            FROM worksite
            WHERE id = allworksites.id
        ) as latestrevision
        ORDER BY revisionnumber DESC
        limit 1
    ) as latestrevisiontype
)

Question

Is there another (perhaps a totally different) better approach to querying this data?

EDIT. Added sample data and desired result as requested

Sample data

worksitedraft

| "id"| "name"          | "reference"| "majorrevisionnumber" | "minorrevisionnumber"
| 101 | "Worksite One"  | "ref-1"    | 1                     | 0
| 102 | "Worksite Two"  | "ref-2"    | 1                     | 0
| 103 | "Worksite Three"| "ref-3"    | 1                     | 0
| 104 | "Worksite Four" | "ref-4"    | 2                     | 0
| 105 | "Worksite Five" | "ref-5"    | 2                     | 0
| 106 | "Worksite Six"  | "ref-6"    | 3                     | 0

worksitesubmission

| "id"| "name"          | "reference"| "majorrevisionnumber" | "minorrevisionnumber"
| 101 | "Worksite One"  | "ref-1"    | 1                     | 1
| 102 | "Worksite Two"  | "ref-2"    | 1                     | 2
| 103 | "Worksite Three"| "ref-3"    | 1                     | 2
| 104 | "Worksite Four" | "ref-4"    | 1                     | 2
| 105 | "Worksite Five" | "ref-5"    | 2                     | 1
| 106 | "Worksite Six"  | "ref-6"    | 2                     | 2

worksite

| "id"| "name"          | "reference"| "majorrevisionnumber" | "minorrevisionnumber"
| 101 | "Worksite One"  | "ref-1"    | 1                     | 2
| 102 | "Worksite Two"  | "ref-2"    | 1                     | 3
| 103 | "Worksite Three"| "ref-3"    | 1                     | 3
| 104 | "Worksite Four" | "ref-4"    | 1                     | 2
| 105 | "Worksite Five" | "ref-5"    | 2                     | 2
| 106 | "Worksite Six"  | "ref-6"    | 2                     | 3

Desired result

| "id"| "name"          | "reference"| "majorrevisionnumber" | "minorrevisionnumber"
| 101 | "Worksite One"  | "ref-1"    | 1                     | 2
| 102 | "Worksite Two"  | "ref-2"    | 1                     | 3
| 103 | "Worksite Three"| "ref-3"    | 1                     | 3
| 104 | "Worksite Four" | "ref-4"    | 2                     | 0
| 105 | "Worksite Five" | "ref-5"    | 2                     | 2
| 106 | "Worksite Six"  | "ref-6"    | 3                     | 0

Note: I'm using postgresql 9.4

EDIT: Final solution after suggestions from @Gordon_Linoff and @a_horse_with_no_name

SELECT DISTINCT ON (id) * 
FROM (SELECT id, name, reference, 'worksiteDraft' as type, majorrevisionnumber, minorrevisionnumber
      FROM worksitedraft
      UNION ALL
      SELECT id, name, reference, 'worksiteSubmission' as type, majorrevisionnumber, minorrevisionnumber
      FROM worksitesubmission
      UNION ALL
      SELECT id, name, reference, 'worksite' as type, majorrevisionnumber, minorrevisionnumber
      FROM worksite
     ) allworksites
ORDER BY id, ARRAY[majorrevisionnumber, minorrevisionnumber]::int[] DESC;

Upvotes: 2

Views: 122

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269883

You can do this with a single query. I think the simplest method is DISTINCT ON:

SELECT DISTINCT ON (id) * 
FROM (SELECT id, name, reference, 'worksiteDraft' as type,
             CONCAT(majorrevisionnumber, '.', minorrevisionnumber) as revisionnumber
      FROM worksitedraft
      UNION ALL
      SELECT id, name, reference, 'worksiteSubmission' as type,      
             CONCAT(majorrevisionnumber, '.', minorrevisionnumber) as revisionnumber
      FROM worksitesubmission
      UNION ALL
      SELECT id, name, reference, 'worksite' as type, 
             CONCAT(majorrevisionnumber, '.', minorrevisionnumber) as revisionnumber
      FROM worksite
     ) allworksites
ORDER BY id, revisionnumber DESC;

Notes:

  • Use UNION ALL instead of UNION. UNION incurs overhead for removing duplicate values, which is unnecessary overhead.
  • Unless your revisions are formatted correctly, concatenating them may not return the latest value. (i.e., 1.9 comes after 1.12 in sort order). However, this is the logic used in the question.
  • DISTINCT ON returns one row per id, the first one encountered, based on the ORDER BY clause.
  • A view is not necessary.

EDIT: To fix the revision number ordering, just don't concatenate them:

SELECT DISTINCT ON (id) * 
FROM (SELECT id, name, reference, 'worksiteDraft' as type,
             majorrevisionnumber, minorrevisionnumber
      FROM worksitedraft
      UNION ALL
      SELECT id, name, reference, 'worksiteSubmission' as type,      
             majorrevisionnumber, minorrevisionnumber
      FROM worksitesubmission
      UNION ALL
      SELECT id, name, reference, 'worksite' as type, 
             majorrevisionnumber, minorrevisionnumber
      FROM worksite
     ) allworksites
ORDER BY id, majorrevisionnumber DESC, minorrevisionnumber DESC;

Upvotes: 2

Related Questions