Using MAX and GROUP BY with Common Table Expressions(CTEs) Postgresql

Question

I'm trying to get pivot table using WITH clause, but i'm stuck on how to return multiple or duplicate rows using MAX().

Here is my original query:

SELECT b.detail_id, a.sampling_date, a.sampling_area, 
        b.sampling_point, b.sampling_type,
        b.ha_tpc, b.ha_entero, b.ha_ecoli, b.ha_salmonella
FROM tbl_header a
    JOIN tbl_detail b ON a.headerid = b.headerid
WHERE 
    a.sampling_date = '2016-12-09' AND 
    a.sampling_area = 'CMP' AND 
    (b.sampling_point ~* '.*(flex).*' OR b.sampling_point ~* '.*(HPM).*' OR b.sampling_point ~* '.*(fr).*') AND 
    (b.sampling_type = 'Personil' OR b.sampling_type = 'Equipment')

And here is the results:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
detail_id  |   sampling_date  |   sampling_area    |       sampling_point           |   sampling_type   |    ha_tpc   |   ha_entero  |  ha_ecoli      |  ha_salmonella  |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
24243      |    2016-12-09    |     CMP            |    BOIL013 (OPERATOR ENFLEX)   |    Personil       |     500     |    50        |     Abs        |     Abs            |
24289      |    2016-12-09    |     CMP            |    MP115 (OPR ENFLEX)          |    Personil       |     300     |    50        |     Abs        |     Abs            |
24284      |    2016-12-09    |     CMP            |    WT033 (FR)                  |    Personil       |     250     |    50        |     Abs        |     Abs            |

With those records i try to create a pivot query like the following:

WITH tmp_date AS (
    SELECT sampling_date.sampling_date::date AS sampling_date
        FROM generate_series(
            (( 
                SELECT min(tbl_header.sampling_date) AS min
                FROM tbl_header
            ))::timestamp with time zone, 
            (( 
                SELECT max(tbl_header.sampling_date) AS max
                FROM tbl_header
            ))::timestamp with time zone, '1 day'::interval
        ) sampling_date(sampling_date)
), 

tmp_detail AS (
    SELECT a.sampling_date, a.sampling_area, 
           b.detail_id, b.sampling_point, b.sampling_type,
           b.ha_tpc, b.ha_entero, b.ha_ecoli, b.ha_salmonella
    FROM tbl_header a
        JOIN tbl_detail b ON a.headerid = b.headerid
), 

resulttable AS (
    SELECT tmp_date.sampling_date, tmp_detail.sampling_point, tmp_detail.sampling_type, 

        -- Case of Filling Room
        CASE
            WHEN tmp_detail.sampling_point ~* '.*(fr).*' AND tmp_detail.sampling_type = 'Personil' THEN tmp_detail.sampling_point
            ELSE NULL
        END AS fr_name, 
        CASE
            WHEN tmp_detail.sampling_point ~* '.*(fr).*' AND tmp_detail.sampling_type = 'Personil' THEN tmp_detail.ha_tpc
            ELSE NULL
        END AS fr_tpc, 
        CASE
            WHEN tmp_detail.sampling_point ~* '.*(fr).*' AND tmp_detail.sampling_type = 'Personil' THEN tmp_detail.ha_entero
            ELSE NULL
        END AS fr_entero, 
        CASE
            WHEN tmp_detail.sampling_point ~* '.*(fr).*' AND tmp_detail.sampling_type = 'Personil' THEN tmp_detail.ha_ecoli
            ELSE NULL
        END AS fr_ecoli, 
        CASE
            WHEN tmp_detail.sampling_point ~* '.*(fr).*' AND tmp_detail.sampling_type = 'Personil' THEN tmp_detail.ha_salmonella
            ELSE NULL
        END AS fr_salmo, 

        -- Case of Hopper Auger Filling 
        CASE
            WHEN (tmp_detail.sampling_point ~* '.*(flex).*' OR tmp_detail.sampling_point ~* '.*(HPM).*') AND tmp_detail.sampling_type = 'Personil' THEN tmp_detail.sampling_point
            ELSE NULL
        END AS hpm_name, 
        CASE
            WHEN (tmp_detail.sampling_point ~* '.*(flex).*' OR tmp_detail.sampling_point ~* '.*(HPM).*') AND tmp_detail.sampling_type = 'Personil' THEN tmp_detail.ha_tpc
            ELSE NULL
        END AS hpm_tpc, 
        CASE
            WHEN (tmp_detail.sampling_point ~* '.*(flex).*' OR tmp_detail.sampling_point ~* '.*(HPM).*') AND tmp_detail.sampling_type = 'Personil' THEN tmp_detail.ha_entero
            ELSE NULL
        END AS hpm_entero, 
        CASE
            WHEN (tmp_detail.sampling_point ~* '.*(flex).*' OR tmp_detail.sampling_point ~* '.*(HPM).*') AND tmp_detail.sampling_type = 'Personil' THEN tmp_detail.ha_ecoli
            ELSE NULL
        END AS hpm_ecoli, 
        CASE
            WHEN (tmp_detail.sampling_point ~* '.*(flex).*' OR tmp_detail.sampling_point ~* '.*(HPM).*') AND tmp_detail.sampling_type = 'Personil' THEN tmp_detail.ha_salmonella
            ELSE NULL
        END AS hpm_salmo

    FROM tmp_date
        FULL JOIN tmp_detail ON tmp_date.sampling_date = tmp_detail.sampling_date
    WHERE 
        tmp_detail.sampling_area = 'CMP' AND 
        tmp_detail.sampling_type = 'Personil' AND 
        (
            tmp_detail.sampling_point ~* ANY ( VALUES  ('(fr)'), ('(flex)'), ('(HPM)') )
        )
)

SELECT 
    resulttable.sampling_date, 
    max(resulttable.sampling_type) AS sampling_type, 

    max(resulttable.fr_name) AS fr_name, 
    max(resulttable.fr_tpc) AS fr_tpc, 
    max(resulttable.fr_entero) AS fr_entero, 
    max(resulttable.fr_ecoli) AS fr_ecoli, 
    max(resulttable.fr_salmo) AS fr_salmonella, 

    max(resulttable.hpm_name) AS hopper_name, 
    max(resulttable.hpm_tpc) AS hopper_tpc, 
    max(resulttable.hpm_entero) AS hopper_entero, 
    max(resulttable.hpm_ecoli) AS hopper_ecoli, 
    max(resulttable.hpm_salmo) AS hopper_salmonella 

FROM resulttable
    GROUP BY resulttable.sampling_date ORDER BY resulttable.sampling_date;

I don't know why those query only return 1 rows, is it because i used GROUP BY?:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sampling_date   | sampling_type  |         fr_name                | fr_tpc  |   fr_entero  |   fr_ecoli |   fr_salmonella  |         hopper_name          |   hopper_tpc  |   hopper_entero   |  hopper_ecoli  |   hopper_salmonella  |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2016-12-09     |     Personil    |        WT033 (FR)               |   250   |   50         |     Abs    |      Abs         |  BOIL013 (OPERATOR ENFLEX)  |     500       |     50            |     Abs        |      Abs             |

I actually want to get a result like the following:

                                 =====================         Case of Filling Room            =========================== | =====================            Case of Hopper Auger Filling                 ============================
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sampling_date   | sampling_type  |         fr_name                | fr_tpc  |   fr_entero  |   fr_ecoli |   fr_salmonella  |         hopper_name          |   hopper_tpc  |   hopper_entero   |  hopper_ecoli  |   hopper_salmonella  |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2016-12-09     |     Personil    |        WT033 (FR)              |   250   |   50         |     Abs    |      Abs         |  BOIL013 (OPERATOR ENFLEX)   |     500       |     50            |     Abs        |      Abs             |
2016-12-09     |     Personil    |                                |         |              |            |                  |  MP115 (OPR ENFLEX)          |     300       |     50            |     Abs        |      Abs             |

Any help would be appreciated, thank you.

Chris Travers · Accepted Answer

Your problem is that your GROUP BY clause specifies criteria that is not specific enough for your needs. Since you only have one sampling date, you get only one row out.

Reading your output, I think you want to add hopper_name to the GROUP BY terms. However it is not 100% clear what you are trying to do so understanding this clause and other alternatives may be important.

GROUP BY without grouping sets

This is currently what you are doing. It says, basically, for every row which is unique across this criteria, return one row out. This is probably what you want. So make sure you have here GROUP BY referencing exactly what you want.

Alternatives: Grouping Sets with Rollup and cube

A complex solution to be aware of is that PostgreSQL allows you to have multiple grouping sets along with rolling up based on criteria. You might want to look at ROLLUP or CUBE for this if GROUP BY does not get you where you want to go. However, this complicates things a bit in retrieving the data because you want to get a bit map of your grouped columns. However since you are trying to do a pivot table, that may be worth noting for further steps forward (as well as pointing other to as well).

Using MAX and GROUP BY with Common Table Expressions(CTEs) Postgresql

Answers (1)

Related Questions