Reputation: 47
I am trying to pull data from different columns using multiple criteria, but having trouble figuring out how, I believe it is a correlated subquery I need and have experimented doing a few different ways but can't figure it out.
I am looking to get the averages for the Miami Heat's win for the following categories + the averages of the New York Knicks losses for the following categories and combine them into one average.
So this is my query for the Heat which retrieves exactly what i want
SELECT
box_score.team_name,
ROUND(AVG(eFG),3) eFG,
ROUND(AVG(OPP_eFG),3) OPP_eFG,
ROUND(AVG(TOV_PCT),3) TOV_PCT,
ROUND(AVG(OPP_TOV_PCT),3) OPP_TOV_PCT,
ROUND(AVG(ORB_PCT),3) ORB_PCT,
ROUND(AVG(DRB_PCT),3) DRB_PCT,
ROUND(AVG(FTA_RATE),3) FTA_RATE,
ROUND(AVG(OPP_FTA_RATE),3) OPP_FTA_RATE
FROM box_score
WHERE team_name = 'Miami Heat' AND WIN_LOSS = 'W' AND game_date < '2019-03-07'
I also did the losses for the knicks which also resulted in what i wanted
WHERE team_name = 'New York Knicks' AND WIN_LOSS = 'L' AND game_date < '2019-03-07'
My problem is trying to combine the two into one query where I get the averages in the Heat's wins and the average in knick's losses. all of this information is from the same table and I can get team information from an id number or by name... I am using SQLite if that changes anything
This is the result of running the query which is what i'm looking for one row of data with the average... however I would like the average of these numbers in both Heat wins and Knicks losses combined into one row
Averages in Heat Wins
eFG OPP_eFG TOV_PCT OPP_TOV_PCT ORB_PCT DRB_PCT FTA_RATE OPP_FTA_RATE
0.603 0.505 0.14 0.126 0.28 0.77 0.235 0.141
These are the averages in knicks losses
eFG OPP_eFG TOV_PCT OPP_TOV_PCT ORB_PCT DRB_PCT FTA_RATE OPP_FTA_RATE
0.568 0.602 0.146 0.136 0.225 0.787 0.222 0.235
I want to get the two combined into 1 average for each category
but is there any way to make it so I get the averages to pull data from seperate columns?
In this case I'm interested in the Miami Heat, so I have the averages above but I what i'm trying to do is get the averages from the Heat to the corresponding opposite stat for the knicks (eFG should correlate with opp_eFG of other team and so on)...so basically I am looking for the averages of the following:
Heat eFG and OPP_eFG Knicks
Heat OPP_eFG and eFG Knicks
Heat TOV_PCT and OPP_TOV_PCT Knicks
Heat OPP_TOV_PCT and TOV_PCT Knicks
Heat FTA_RATE and OPP_FTA_RATE Knicks
Heat OPP_FTA_RATE and FTA_RATE Knicks
still looking to get 1 row as the result
Upvotes: 0
Views: 87
Reputation: 1270401
If you want to calculate the averages first and then average them, you can use two levels of aggregation:
SELECT ROUND(AVG(eFG), 3) as eFG,
ROUND(AVG(OPP_eFG), 3) as OPP_eFG,
ROUND(AVG(TOV_PCT), 3) as TOV_PCT,
ROUND(AVG(OPP_TOV_PCT), 3) as OPP_TOV_PCT,
ROUND(AVG(ORB_PCT), 3) as ORB_PCT,
ROUND(AVG(DRB_PCT), 3) as DRB_PCT,
ROUND(AVG(FTA_RATE), 3) as FTA_RATE,
ROUND(AVG(OPP_FTA_RATE), 3) as OPP_FTA_RATE
FROM (SELECT bs.team_name,
AVG(eFG) as eFG,
AVG(OPP_eFG) as OPP_eFG,
AVG(TOV_PCT) as TOV_PCT,
AVG(OPP_TOV_PCT) as OPP_TOV_PCT,
AVG(ORB_PCT) as ORB_PCT,
AVG(DRB_PCT) as DRB_PCT,
AVG(FTA_RATE) as FTA_RATE,
AVG(OPP_FTA_RATE) as OPP_FTA_RATE
FROM box_score bs
WHERE game_date < '2019-03-07' AND
( (team_name = 'Miami Heat' AND WIN_LOSS = 'W') OR
(team_name = 'New York Knicks' AND WIN_LOSS = 'L')
)
) bs
Upvotes: 1
Reputation: 634
I'd like to promote Common Table Expressions for this:
WITH selector_heat as (
SELECT
box_score.team_name,
ROUND(AVG(eFG),3) eFG,
ROUND(AVG(OPP_eFG),3) OPP_eFG,
ROUND(AVG(TOV_PCT),3) TOV_PCT,
ROUND(AVG(OPP_TOV_PCT),3) OPP_TOV_PCT,
ROUND(AVG(ORB_PCT),3) ORB_PCT,
ROUND(AVG(DRB_PCT),3) DRB_PCT,
ROUND(AVG(FTA_RATE),3) FTA_RATE,
ROUND(AVG(OPP_FTA_RATE),3) OPP_FTA_RATE
FROM box_score
WHERE team_name = 'Miami Heat' AND WIN_LOSS = 'W' AND game_date < '2019-03-07'
)
, selector_knicks as (
...
)
select H.eFG - K.OPP_eFG as magic_nbr
from selector_heat H
join selector_knicks K ON (1=1)
More details on syntax here: https://www.sqlite.org/lang_with.html but ignore "recursive" bits for now, you don't need them in this instance.
Alternatively (and with a slightly different approach angle) you could use Window clauses to aggregate "per team" and then use the results. More info here: https://www.sqlite.org/windowfunctions.html#introduction_to_window_functions
Example:
SELECT
team_name,
WIN_LOSS,
ROUND(AVG(eFG) OVER (partition by team_name, win_loss),3) as eFG
...
from box_score
where game_date < '2019-03-07'
With this resultset you have your averages for all teams and win_loss combinations. Wrap this in a CTE and join to itself on conditions to suit, e.g.
WITH cte as (SELECT ...)
SELECT H.eFG - K.OPP_eFG as magic_nbr
FROM cte H join cte K
ON (H.team_name = 'Miami Heat'
AND K.team_name = 'NY Knicks'
AND H.win_loss = 'W'
AND K.win_loss = 'L')
Upvotes: 1
Reputation: 222582
One solution would be to perform the whole operation in a single table scan (without joins or subqueries), using conditional aggregation:
SELECT
box_score.team_name,
ROUND(AVG(CASE WHEN team_name = 'Miami Heat' AND WIN_LOSS = 'W' THEN eFG END),3) Heat_eFG,
ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN eFG END),3) Knicks_eFG,
ROUND(AVG(CASE WHEN team_name = 'Miami Heat' AND WIN_LOSS = 'W' THEN OPP_eFG END),3) Heat_OPP_eFG,
ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN OPP_eFG END),3) Knicks_OPP_eFG,
ROUND(AVG(CASE WHEN team_name = 'Miami Heat' AND WIN_LOSS = 'W' THEN TOV_PCT END),3) Heat_TOV_PCT,
ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN TOV_PCT END),3) Knicks_TOV_PCT,
ROUND(AVG(CASE WHEN team_name = 'Miami Heat' AND WIN_LOSS = 'W' THEN OPP_TOV_PCT END),3) Heat_OPP_TOV_PCT,
ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN OPP_TOV_PCT END),3) Knicks_OPP_TOV_PCT,
ROUND(AVG(CASE WHEN team_name = 'Miami Heat' AND WIN_LOSS = 'W' THEN ORB_PCT END),3) Heat_ORB_PCT,
ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN ORB_PCT END),3) Knicks_ORB_PCT,
ROUND(AVG(CASE WHEN team_name = 'Miami Heat' AND WIN_LOSS = 'W' THEN DRB_PCT END),3) Heat_DRB_PCT,
ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN DRB_PCT END),3) Knicks_DRB_PCT,
ROUND(AVG(CASE WHEN team_name = 'Miami Heat' AND WIN_LOSS = 'W' THEN FTA_RATE END),3) Heat_FTA_RATE,
ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN FTA_RATE END),3) Knicks_FTA_RATE,
ROUND(AVG(CASE WHEN team_name = 'Miami Heat' AND WIN_LOSS = 'W' THEN OPP_FTA_RATE END),3) Heat_OPP_FTA_RATE,
ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN OPP_FTA_RATE END),3) Knicks_OPP_FTA_RATE
FROM box_score
WHERE team_name IN ('Miami Heat', 'New York Knicks') AND game_date < '2019-03-07'
Here is another version of the query if you are looking to compute the average, for example of both eFG
for Miami wins AND OPP_eFG
on New York losses in a single column. This still relies on conditional aggregation. I also slighly simplified the logic by moving conditions to the WHERE
clause.
SELECT
box_score.team_name,
ROUND(AVG(CASE
WHEN team_name = 'Miami Heat' THEN eFG
WHEN team_name = 'New York Knicks' THEN OPP_eFG
END, 3) Heats_eFG_Knicks_OPP_eFG,
ROUND(AVG(CASE
WHEN team_name = 'Miami Heat' THEN OPP_eFG
WHEN team_name = 'New York Knicks' THEN eFG
END, 3) Heats_OPP_eFG_Knicks_eFG,
ROUND(AVG(CASE
WHEN team_name = 'Miami Heat' THEN TOV_PCT
WHEN team_name = 'New York Knicks' THEN OPP_TOV_PCT
END, 3) Heats_TOV_PCT_Knicks_OPP_TOV_PCT,
ROUND(AVG(CASE
WHEN team_name = 'Miami Heat' THEN OPP_TOV_PCT
WHEN team_name = 'New York Knicks' THEN TOV_PCT
END, 3) Heats_OPP_TOV_PCT_Knicks_TOV_PCT,
ROUND(AVG(CASE
WHEN team_name = 'Miami Heat' THEN FTA_RATE
WHEN team_name = 'New York Knicks' THEN OPP_FTA_RATE
END, 3) Heats_FTA_RATE_Knicks_OPP_FTA_RATE,
ROUND(AVG(CASE
WHEN team_name = 'Miami Heat' THEN OPP_FTA_RATE
WHEN team_name = 'New York Knicks' THEN FTA_RATE
END, 3) Heats_OPP_FTA_RATE_Knicks_FTA_RATE
FROM box_score
WHERE
game_date < '2019-03-07'
AND (
( team_name = 'Miami Heat' AND win_loss = 'W' )
OR ( team_name = 'New York Knicks' AND win_loss = 'L')
)
NB: as commented by wildpasser, you probably want to use single quotes instead of double quotes around litteral values (this is the SQL standard). I globally all of the double quotes in the original query to single quotes.
Upvotes: 0