CodyBugstein
CodyBugstein

Reputation: 23322

Including an attribute in SELECT without having to include it in GROUP BY

I have a list of players who have hit Grand Slams this season, so far.

___________________________________________________
| Player Name | dateOfGrandSlam | distance | home |
---------------------------------------------------
|  Griffin    |    9-14-2013   |    413   | true |
|  Griffin    |    10-1-2013   |    371   | false|
|  Simpson    |    5-15-2013   |    413   | true |
|   Reid      |    7-1-2013    |    362   | true |
|   Reid      |    7-4-2013    |    363   | true |
|   Reid      |    9-28-2013   |    388   | true |
|  Peavis     |    8-14-2013   |    466   | false|

I want to get a list of players whose most recent grandslam was at home. If their most recent grandslam wasn't at home, I don't want them to show up on my list. This means, I need to select the player and group by the player and select the max date from that group. In that list, I also have to include home/away information so I can pick out the ones that were not at home.

However I'm having a problem. Because in order to select the attribute home, I also need to include home in the GROUP BY clause. For example:

SELECT playerName, MAX(date), distance, home 
FROM grandslams 
GROUP BY playerName, distance, home

The problem is that this returns a table with the most recent home and the most recent away grandslams.


| Player Name | dateOfGrandSlam | distance | home |
---------------------------------------------------
|  Griffin    |    9-14-2013   |    413   | true |
|  Griffin    |    10-1-2013   |    371   | false|
|  Simpson    |    5-15-2013   |    413   | true |
|   Reid      |    9-28-2013   |    388   | true |

This is not what I want - I want ONLY the most recent home grandslams IF there were no away grandslams more recently.

I want this result:

___________________________________________________
| Player Name | dateOfGrandSlam | distance | home |
---------------------------------------------------
|  Simpson    |    5-15-2013   |    413   | true |
|   Reid      |    9-28-2013   |    388   | true |

Essentially I need a way to perform the query that just gets me the most recent grandslam per player, tacks on the home attribute (without having to group by it so I don't get his most recent home and his most recent away) and then can be easily filtered in an outer query.

In other words, I need to get the result from

SELECT playerName, MAX(date), distance
FROM grandSlams
GROUP BY playerName, distance

and attach the home attribute to it.

Upvotes: 1

Views: 2237

Answers (5)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656844

The basic query would be simplest and fastest with DISTINCT ON, which is a Postgres specific extension of the SQL standard DISTINCT:

SELECT DISTINCT ON (playername)
       playername, dateofgrandslam, distance, home 
FROM   grandslams 
GROUP  BY playername, dateofgrandslamDESC;

Compare performance with EXPLAIN ANALYZE. Detailed explanation in this related answer:
Select first row in each GROUP BY group?

Use a subquery to eliminate cases with the most recent grandslam not at home:

SELECT * FROM (
    SELECT DISTINCT ON (playername)
           playername, dateofgrandslam, distance, home 
    FROM   grandslams 
    GROUP  BY playername, dateofgrandslam DESC
   ) sub
WHERE  home;

Another hot contender for the performance crown (depending on what you need exactly), and purely standard SQL: a NOT EXISTS anti-semi-join. Also simpler than it sounds:

SELECT playername, dateofgrandslam, distance, home 
FROM   grandslams g
WHERE  home
AND NOT EXISTS (
   SELECT 1
   FROM   grandslams g1
   WHERE  g1.playername = g.playername
   AND    g1.dateofgrandslam > g.dateofgrandslam
   );

Aside: Using the lower case for dateofgrandslam, since unquoted identifiers are cast to lower case anyway in Postgres.

Upvotes: 0

Anon
Anon

Reputation: 10908

Break the problem down.

1) Build a query that gets you the players whose last slam was at home. I.e., their most recent home slam was their most recent slam.

SELECT
  playerName,
  MAX(dateOfGrandSlam) mostRecentGrandSlam
FROM grandSlams
GROUP BY playerName
HAVING MAX(CASE home WHEN 'true' THEN dateOfGrandSlam END) = MAX(dateOfGrandSlam)

2) Use that query to filter the original table to just the rows you want:

SELECT a.playerName, a.dateOfGrandSlam, a.distance, a.home
FROM grandSlams a
INNER JOIN (
  SELECT
    playerName,
    MAX(dateOfGrandSlam) mostRecentGrandSlam
  FROM grandSlams
  GROUP BY playerName
  HAVING MAX(CASE home WHEN 'true' THEN dateOfGrandSlam END) = MAX(dateOfGrandSlam)
) b ON (a.playerName = b.playerName AND a.dateOfGrandSlam = b.mostRecentGrandSlam)

Upvotes: 0

mrras
mrras

Reputation: 462

You can first find all of the last home runs, then join them to the original table.

SELECT g.* from (SELECT playerName, MAX(date) date
FROM grandslams 
GROUP BY playerName) a
NATURAL JOIN grandslams g
WHERE g.home = true;

If you are not a fan of natural join (for ansi's sake) you can always use

JOIN grandslams g on g.playerName = a.playerName and g.date = a.date

instead of the natural join. this would make the query ansi.

Upvotes: 0

mw90
mw90

Reputation: 132

Try this:

SELECT playerName, MAX(date), distance, home
FROM grandslams
GROUP BY playerName, distance, home
WHERE home='true'

you may need to use " or ' depending on what type of SQL you are using.

However, I am a little confused by your question due to this:

"I want to get a list of players whose most recent grandslam was at home. If their most recent grandslam wasn't at home, I don't want them to show up on my list." and then "This is not what I want - I want ONLY the most recent home grandslams IF there were no away grandslams more recently."

Are you looking for the most recent grandslam in general? or just the most recent home grandslam?

Upvotes: -1

Gordon Linoff
Gordon Linoff

Reputation: 1269883

You can find the most recent grand slam for each player by using the row_number() function to enumerate the grand slams for each player, ordered by the date in descending order. To choose the most recent, choose the one where the value is 1. Then add the condition on the grand slam being from home:

select gs.*
from (select gs.*,
             row_number() over (partition by PlayerName
                                order by dateOfGrandSlam desc) as seqnum
      from GrandSlams gs
     ) gs
where seqnum = 1 and home = true;

Upvotes: 3

Related Questions