Reputation: 23322
I have a list of players who have hit Grand Slams this season, so far.
___________________________________________________
| Player Name | dateOfGrandSlam | distance | home |
---------------------------------------------------
| Griffin | 9-14-2013 | 413 | true |
| Griffin | 10-1-2013 | 371 | false|
| Simpson | 5-15-2013 | 413 | true |
| Reid | 7-1-2013 | 362 | true |
| Reid | 7-4-2013 | 363 | true |
| Reid | 9-28-2013 | 388 | true |
| Peavis | 8-14-2013 | 466 | false|
I want to get a list of players whose most recent grandslam was at home. If their most recent grandslam wasn't at home, I don't want them to show up on my list. This means, I need to select the player and group by the player and select the max date from that group. In that list, I also have to include home/away information so I can pick out the ones that were not at home.
However I'm having a problem. Because in order to select the attribute home
, I also need to include home
in the GROUP BY
clause.
For example:
SELECT playerName, MAX(date), distance, home
FROM grandslams
GROUP BY playerName, distance, home
The problem is that this returns a table with the most recent home and the most recent away grandslams.
| Player Name | dateOfGrandSlam | distance | home |
---------------------------------------------------
| Griffin | 9-14-2013 | 413 | true |
| Griffin | 10-1-2013 | 371 | false|
| Simpson | 5-15-2013 | 413 | true |
| Reid | 9-28-2013 | 388 | true |
This is not what I want - I want ONLY the most recent home grandslams IF there were no away grandslams more recently.
I want this result:
___________________________________________________
| Player Name | dateOfGrandSlam | distance | home |
---------------------------------------------------
| Simpson | 5-15-2013 | 413 | true |
| Reid | 9-28-2013 | 388 | true |
Essentially I need a way to perform the query that just gets me the most recent grandslam per player, tacks on the home
attribute (without having to group by it so I don't get his most recent home and his most recent away) and then can be easily filtered in an outer query.
In other words, I need to get the result from
SELECT playerName, MAX(date), distance
FROM grandSlams
GROUP BY playerName, distance
and attach the home
attribute to it.
Upvotes: 1
Views: 2237
Reputation: 656844
The basic query would be simplest and fastest with DISTINCT ON
, which is a Postgres specific extension of the SQL standard DISTINCT
:
SELECT DISTINCT ON (playername)
playername, dateofgrandslam, distance, home
FROM grandslams
GROUP BY playername, dateofgrandslamDESC;
Compare performance with EXPLAIN ANALYZE
. Detailed explanation in this related answer:
Select first row in each GROUP BY group?
Use a subquery to eliminate cases with the most recent grandslam not at home:
SELECT * FROM (
SELECT DISTINCT ON (playername)
playername, dateofgrandslam, distance, home
FROM grandslams
GROUP BY playername, dateofgrandslam DESC
) sub
WHERE home;
Another hot contender for the performance crown (depending on what you need exactly), and purely standard SQL: a NOT EXISTS
anti-semi-join. Also simpler than it sounds:
SELECT playername, dateofgrandslam, distance, home
FROM grandslams g
WHERE home
AND NOT EXISTS (
SELECT 1
FROM grandslams g1
WHERE g1.playername = g.playername
AND g1.dateofgrandslam > g.dateofgrandslam
);
Aside: Using the lower case for dateofgrandslam
, since unquoted identifiers are cast to lower case anyway in Postgres.
Upvotes: 0
Reputation: 10908
Break the problem down.
1) Build a query that gets you the players whose last slam was at home. I.e., their most recent home slam was their most recent slam.
SELECT
playerName,
MAX(dateOfGrandSlam) mostRecentGrandSlam
FROM grandSlams
GROUP BY playerName
HAVING MAX(CASE home WHEN 'true' THEN dateOfGrandSlam END) = MAX(dateOfGrandSlam)
2) Use that query to filter the original table to just the rows you want:
SELECT a.playerName, a.dateOfGrandSlam, a.distance, a.home
FROM grandSlams a
INNER JOIN (
SELECT
playerName,
MAX(dateOfGrandSlam) mostRecentGrandSlam
FROM grandSlams
GROUP BY playerName
HAVING MAX(CASE home WHEN 'true' THEN dateOfGrandSlam END) = MAX(dateOfGrandSlam)
) b ON (a.playerName = b.playerName AND a.dateOfGrandSlam = b.mostRecentGrandSlam)
Upvotes: 0
Reputation: 462
You can first find all of the last home runs, then join them to the original table.
SELECT g.* from (SELECT playerName, MAX(date) date
FROM grandslams
GROUP BY playerName) a
NATURAL JOIN grandslams g
WHERE g.home = true;
If you are not a fan of natural join (for ansi's sake) you can always use
JOIN grandslams g on g.playerName = a.playerName and g.date = a.date
instead of the natural join. this would make the query ansi.
Upvotes: 0
Reputation: 132
Try this:
SELECT playerName, MAX(date), distance, home
FROM grandslams
GROUP BY playerName, distance, home
WHERE home='true'
you may need to use " or ' depending on what type of SQL you are using.
However, I am a little confused by your question due to this:
"I want to get a list of players whose most recent grandslam was at home. If their most recent grandslam wasn't at home, I don't want them to show up on my list." and then "This is not what I want - I want ONLY the most recent home grandslams IF there were no away grandslams more recently."
Are you looking for the most recent grandslam in general? or just the most recent home grandslam?
Upvotes: -1
Reputation: 1269883
You can find the most recent grand slam for each player by using the row_number()
function to enumerate the grand slams for each player, ordered by the date in descending order. To choose the most recent, choose the one where the value is 1. Then add the condition on the grand slam being from home:
select gs.*
from (select gs.*,
row_number() over (partition by PlayerName
order by dateOfGrandSlam desc) as seqnum
from GrandSlams gs
) gs
where seqnum = 1 and home = true;
Upvotes: 3