Nitseg
Nitseg

Reputation: 1277

Joining aggregated results from different tables in a single SQL request

I need to determine the best way to make multiple aggregations on different tables that are joined in a SQL request.

Given the following tables to manage an hotel, the room service planning, client tips and client evaluation:

Room
 id | name 
----+------
  1 | 101
  2 | 102
  3 | 103
  4 | 201
  5 | 202
  6 | 203

housekeeper
 id |   name   | age 
----+----------+-----
  1 | John     |  48
  2 | Veronica |  25
  3 | Bob      |  19

room_service_planning
 id |    date    | room_id | housekeeper_id 
----+------------+---------+----------------
  1 | 2014-11-01 |       3 |              2
  2 | 2014-11-01 |       1 |              2
  3 | 2014-11-02 |       5 |              1

tips
 id | amount | housekeeper_id 
----+--------+----------------
  1 | 5,00 € |              1
  2 | 2,00 € |              3
  3 | 2,00 € |              1
  4 | 3,00 € |              3


client_eval
 id | good_eval | housekeeper_id 
----+-----------+----------------
  1 | t         |              1
  2 | f         |              1
  3 | t         |              2
  4 | t         |              2

The manager wants to know for each planned room service, who is assigned to it, the tips sum, number of client evaluations and the number of positive client evaluations this employee got in his career.

The expected result when looking for room service between 2014-11-01 and 2014-11-02 is something like:

 id |    date    | room_id | housekeeper_id |  hk_name | hk_tips_sum | hk_tot_eval | hk_pos_eval 
----+------------+---------+----------------+----------+-------------+-------------+-------------
  1 | 2014-11-01 |       3 |              2   Veronica                           2             2
  2 | 2014-11-01 |       1 |              2   Veronica                           2             2
  3 | 2014-11-02 |       5 |              1       John        7,00 €             2             1

The solutions I explored to get this result:

Solution 1:

SELECT temp2.id as id, temp2.date as date, temp2.room_id as room_id, 
temp2.housekeeper_id as housekeeper_id, temp2.hk_name as hk_name, 
temp2.hk_tips_sum as hk_tips_sum, temp2.hk_tot_eval as hk_tot_eval, 
count(1) as hk_post_eval
FROM

    (
    SELECT temp.id as id, temp.date as date, temp.room_id as room_id, 
    temp.housekeeper_id as housekeeper_id, temp.hk_name as hk_name, 
    temp.hk_tips_sum as hk_tips_sum, count(1) as hk_tot_eval

    FROM

        (SELECT rsp.id as id, rsp.date as date, rsp.room_id as room_id, 
        rsp.housekeeper_id as housekeeper_id, hk.name as hk_name, 
        sum(t.amount) as hk_tips_sum
        FROM room_service_planning rsp
        INNER JOIN housekeeper hk 
            ON rsp.date>='2014-11-01' 
            AND rsp.date<='2014-11-02' 
            AND hk.id=rsp.housekeeper_id
        LEFT JOIN tips t
            ON t.housekeeper_id=hk.id
        GROUP BY rsp.id, rsp.date, rsp.room_id, rsp.housekeeper_id, hk_name
        ) temp

    LEFT JOIN client_eval ce_tot
        ON ce_tot.housekeeper_id=temp.housekeeper_id

    GROUP BY temp.id, temp.date, temp.room_id, temp.housekeeper_id, 
        temp.hk_name, temp.hk_tips_sum

    ) temp2

LEFT JOIN client_eval ce_pos
    ON ce_pos.housekeeper_id=temp2.housekeeper_id
    AND ce_pos.good_eval='t'

GROUP BY temp2.id, temp2.date, temp2.room_id, temp2.housekeeper_id, 
temp2.hk_name, temp2.hk_tips_sum, temp2.hk_tot_eval; 

Note: this is based on a "group by to aggregate" then "join with next table" then "group by to aggregate" then "join with next table" and so on... . This is working but very heavy to write and difficult to read. I'm not satisfied with this solution.

Solution 2:

SELECT rsp.id as id, rsp.date as date, rsp.room_id as room_id,
rsp.housekeeper_id as houkeeper_id, hk.name as hk_name, 
t.amount as hk_tips_sum, ce_tot.hk_tot_eval as hk_tot_eval, 
ce_pos.hk_pos_eval as hk_pos_eval
FROM room_service_planning rsp
INNER JOIN housekeeper hk 
    ON rsp.date>='2014-11-01' 
    AND rsp.date<='2014-11-02' 
    AND hk.id=rsp.housekeeper_id
LEFT JOIN 
    (SELECT housekeeper_id, sum(amount) as amount 
     FROM tips
     GROUP BY housekeeper_id) t
    ON t.housekeeper_id=hk.id
LEFT JOIN
    (SELECT housekeeper_id, count(1) as hk_tot_eval
     FROM client_eval
     GROUP BY housekeeper_id) ce_tot
    ON ce_tot.housekeeper_id=hk.id
LEFT JOIN
    (SELECT housekeeper_id, count(good_eval) as hk_pos_eval
     FROM client_eval
     WHERE good_eval='t'
     GROUP BY housekeeper_id) ce_pos
    ON ce_pos.housekeeper_id=hk.id;

Note: this solution is more readable but I'm wondering what happens is the number of records grows for 'tips' or 'client_eval'. Let'ss imagine there are millions of tips and millions of client eval in the hotel. That means that we will do millions of sums and counts and then we will just select the few ones that are needed. This is waste of resources and can lead to very long delays.

Conclusion: Despite I found two different ways to achieve my goal I'm not satisfied with them.

What smarter and more efficient solution could you suggest to solve this problem?

Upvotes: 1

Views: 119

Answers (2)

R&#233;my  Baron
R&#233;my Baron

Reputation: 1399

try this :

select * FROM (
      room_service_planning rsp
      INNER JOIN housekeeper hk on hk.id=rsp.housekeeper_id
      ) 
left join (SELECT housekeeper_id,SUM(amount) hk_tips_sum from 
               tips group by 1) tips using (housekeeper_id)
left join (SELECT housekeeper_id,COUNT(*) hk_eval_count,
                  count(NULLIF(good_eval,false)) hk_positive_eval_count 
           from client_eval group by 1) evals  using (housekeeper_id)
where rsp.date>='2014-11-01' and rsp.date<='2014-11-02' 

Upvotes: 0

Bryan Walker
Bryan Walker

Reputation: 921

I see a couple of things here. I haven't actually seriously utilized postgresql in a while, so I hope I'm not totally off base.

If you want to just get the aggregate information associated with the housekeepers to join to the daily schedule, you might want to consider creating a view that encapsulates housekeeper id, total tips, total evals, and total positive evals. That should take advantage of any caching at the server level and reduce the number of function calls necessary.

If you only want to get the info for the one housekeeper you need, you could do a subselect in your query:

SELECT 
    rsp.id as id, rsp.date as date, rsp.room_id as room_id,
    rsp.housekeeper_id as housekeeper_id, hk.name as hk_name, 
        (SELECT SUM(t.amount) from tips where housekeeper_id = rsp.housekeeper_id) as hk_tips_sum,
        (SELECT COUNT(1) from client_eval where housekeeper_id = rsp.housekeeper_id) as hk_eval_count,
        (SELECT COUNT(1) from client_eval where housekeeper_id = rsp.housekeeper_id and good_eval='t') as hk_positive_eval_count
   FROM room_service_planning rsp
       INNER JOIN housekeeper hk 
            ON rsp.date>='2014-11-01' 
            AND rsp.date<='2014-11-02' 
            AND hk.id=rsp.housekeeper_id

That'll only calculate the aggregates where they're needed if the view is overkill.

Finally, is it important which shift the tips/evaluations come from? It might make sense to foreign key them to the room_service_planning table instead of the housekeeper table, or in addition to the housekeeper table if you won't always know which guest gave the tip/review.

Upvotes: 1

Related Questions