Reputation: 4248
This is a sample of data from a sqlite3 database:
playerID nameFirst nameLast HR (homeruns)
bondsba01 Barry Bonds 73
mcgwima01 Mark McGwire 70
sosasa01 Sammy Sosa 66
mcgwima01 Mark McGwire 65
sosasa01 Sammy Sosa 64
sosasa01 Sammy Sosa 63
marisro01 Roger Maris 61
ruthba01 Babe Ruth 60
ruthba01 Babe Ruth 59
foxxji01 Jimmie Foxx 58
greenha01 Hank Greenberg 58
...truncated...
I've used this command to select this data:
SELECT Master.playerID, Master.nameFirst, Master.nameLast, Batting.HR
FROM Master
INNER JOIN Batting on Master.playerID = Batting.playerID
ORDER BY Batting.HR DESC
LIMIT 100;
Baiscally I want to answer the following question: what's the top 20 baseball players by number of homeruns? But, as you can see, some players have multiple entries, because the tables have data for multiple years that player was active.
I already answered the question using pandas
and some other Python commands. But I want to be able to answer the question with a single SQL query. I've tried without success a couple of things with DISTINCT
, GROUP BY
etc.
To be more clear, I want the maximum number of home runs each unique player had. Then I want to sort all those unique names by the number of home runs.
I want to transfrom the above table into this:
playerID nameFirst nameLast HR (homeruns)
bondsba01 Barry Bonds 73
mcgwima01 Mark McGwire 70
sosasa01 Sammy Sosa 66
marisro01 Roger Maris 61
ruthba01 Babe Ruth 60
foxxji01 Jimmie Foxx 58
greenha01 Hank Greenberg 58
I guess this shouldn't be very hard, but I don't have much experience with SQL.
Upvotes: 0
Views: 41
Reputation: 8865
using MAX ()
SELECT M.playerID, M.nameFirst, M.nameLast, MAX(B.HR) Runs
FROM Master M
INNER JOIN Batting B
on M.playerID = B.playerID
GROUP BY M.playerID, M.nameFirst, M.nameLast
Upvotes: 4