Ben Smith
Ben Smith

Reputation: 861

Left Join with Group By

I am using PostgreSQL 9.4.

I have a table of workouts. Users can create multiple results for each workout, and a result has a score.

Given a list of workout_ids and two user_ids, I want to return the best score for each workout for each user. If the user does not have a result for that workout, I want to return a padded/null result.

SELECT "results".*, "workouts".* 
FROM "results" LEFT JOIN "workouts" ON "workouts"."id" = "results"."workout_id" 
WHERE (
  (user_id, workout_id, score) IN 
  (SELECT user_id, workout_id, MAX(score) 
    FROM results WHERE user_id IN (1, 2) AND workout_id IN (1, 2, 3) 
    GROUP BY user_id, workout_id)
) 

In this query, the left join is acting as an inner join; I'm not getting any padding if the user has not got a result for the workout. This query should always return six rows, regardless of how many results exist.

Example data:

results
user_id | workout_id | score 
-----------------------------
      1 |          1 |     10
      1 |          3 |     10
      1 |          3 |     15
      2 |          1 |      5

Desired result:

results.user_id | results.workout_id | max(results.score) | workouts.name
-------------------------------------------------------------------------
              1 |                  1 |                 10 | Squat
              1 |                  2 |               null | Bench
              1 |                  3 |                 15 | Deadlift
              2 |                  1 |                  5 | Squat
              2 |                  2 |               null | Bench
              2 |                  3 |               null | Deadlift

Upvotes: 5

Views: 15950

Answers (3)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656291

SELECT DISTINCT ON (1, 2)
       u.user_id
     , w.id AS workout_id
     , r.score
     , w.name AS workout_name
FROM   workouts w
CROSS  JOIN (VALUES (1), (2)) u(user_id)
LEFT   JOIN  results r ON r.workout_id = w.id
                      AND r.user_id = u.user_id
WHERE  w.id IN (1, 2, 3)
ORDER  BY 1, 2, r.score DESC NULLS LAST;

Step by step explanation

  1. Form a complete Cartesian product of given workouts and users.
    Assuming the given workouts always exist.
    Assuming that not all given users have results for all given workouts.

  2. LEFT JOIN to results. All conditions go into the ON clause of the LEFT JOIN, not into the WHERE clause, which would exclude (workout_id, user_id) combinations that have no result. See:

  3. Finally pick the best result per (user_id, workout_id) with DISTINCT ON. While being at it, produce the desired sort order. See:

Depending on the size of tables and data distribution there may be faster solutions. See:

Simple version

If all you want is the maximum score for each (user_id, workout_id) combination, there is simple version:

SELECT user_id, workout_id, max(r.score) AS score
FROM        unnest('{1,2}'::int[])   u(user_id)
CROSS  JOIN unnest('{1,2,3}'::int[]) w(workout_id)
LEFT   JOIN results r USING (user_id, workout_id)
GROUP  BY 1, 2
ORDER  BY 1, 2;

db<>fiddle here
Old sqlfiddle.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269483

How about using distinct on or row_number()?

SELECT DISTINCT ON (r.user_id, r.workout_id) r.*, w.* 
FROM "results" r LEFT JOIN
     "workouts" w
     ON "w."id" = r."workout_id" 
WHERE r.user_id IN (1, 2) AND r.workout_id IN (1, 2, 3) 
ORDER BY r.user_id, r.workout_id, score desc;

The row_number() equivalent requires a subquery:

SELECT rw.*
FROM (SELECT r.*, w.*,
             row_number() over (partition by user_id, workout_id order by score desc) as seqnum 
      FROM "results" r LEFT JOIN
           "workouts" w
           ON "w."id" = r."workout_id" 
      WHERE r.user_id IN (1, 2) AND r.workout_id IN (1, 2, 3) 
     ) rw
WHERE seqnum = 1;

You should choose the columns more judiciously than using a *. The subquery might return errors in the case of duplicate column names.

EDIT:

You need to generate the rows first, and then the results for each. Here is one method, building on the second query:

SELECT u.user_id, w.workout_id, rw.score, rw.name
FROM (SELECT 1 as user_id UNION ALL SELECT 2) u CROSS JOIN
     (SELECT 1 as workout_id UNION ALL SELECT 2 UNION ALL SELECT 3) w LEFT JOIN
     (SELECT r.*, w.*,
             row_number() over (partition by user_id, workout_id order by score desc) as seqnum 
      FROM "results" r LEFT JOIN
           "workouts" w
           ON "w."id" = r."workout_id" 
      WHERE r.user_id IN (1, 2) AND r.workout_id IN (1, 2, 3) 
     ) rw
     ON rw.user_id = u.user_id and rw.workout_id = w.workout_id and
        rw.seqnum = 1;

Upvotes: 1

Norbert
Norbert

Reputation: 6084

The where filters out your NULL values, so that is why the result is not what you expect.

Joinint the WHERE clause results instead of filter the where clause results.

SELECT "results".*, "workouts".*,"max_score".*
FROM "results" 
LEFT JOIN "workouts" ON "workouts"."id" = "results"."workout_id"
LEFT JOIN (SELECT user_id, workout_id, MAX(score) 
    FROM results WHERE user_id IN (1, 2) AND workout_id IN (1, 2, 3) 
    GROUP BY user_id, workout_id) max_score ON workouts.workout_id=max_score.workout_id;

You need to alter the SELECT to get the correct columns.

Upvotes: 4

Related Questions