GDolan
GDolan

Reputation: 47

Correlated Subquery? pulling data from different columns, same table

I am trying to pull data from different columns using multiple criteria, but having trouble figuring out how, I believe it is a correlated subquery I need and have experimented doing a few different ways but can't figure it out.

I am looking to get the averages for the Miami Heat's win for the following categories + the averages of the New York Knicks losses for the following categories and combine them into one average.

So this is my query for the Heat which retrieves exactly what i want

SELECT
    box_score.team_name, 
    ROUND(AVG(eFG),3) eFG,
    ROUND(AVG(OPP_eFG),3) OPP_eFG,
    ROUND(AVG(TOV_PCT),3) TOV_PCT,
    ROUND(AVG(OPP_TOV_PCT),3) OPP_TOV_PCT,
    ROUND(AVG(ORB_PCT),3) ORB_PCT,
    ROUND(AVG(DRB_PCT),3) DRB_PCT,
    ROUND(AVG(FTA_RATE),3) FTA_RATE,
    ROUND(AVG(OPP_FTA_RATE),3) OPP_FTA_RATE
FROM box_score
WHERE team_name = 'Miami Heat' AND WIN_LOSS = 'W' AND game_date < '2019-03-07' 

I also did the losses for the knicks which also resulted in what i wanted

WHERE team_name = 'New York Knicks' AND WIN_LOSS = 'L' AND game_date < '2019-03-07' 

My problem is trying to combine the two into one query where I get the averages in the Heat's wins and the average in knick's losses. all of this information is from the same table and I can get team information from an id number or by name... I am using SQLite if that changes anything

This is the result of running the query which is what i'm looking for one row of data with the average... however I would like the average of these numbers in both Heat wins and Knicks losses combined into one row

Averages in Heat Wins

eFG    OPP_eFG  TOV_PCT  OPP_TOV_PCT  ORB_PCT  DRB_PCT  FTA_RATE  OPP_FTA_RATE
0.603  0.505    0.14     0.126        0.28     0.77     0.235     0.141

These are the averages in knicks losses

eFG    OPP_eFG  TOV_PCT  OPP_TOV_PCT  ORB_PCT  DRB_PCT  FTA_RATE  OPP_FTA_RATE
0.568  0.602    0.146    0.136        0.225    0.787    0.222     0.235

I want to get the two combined into 1 average for each category

but is there any way to make it so I get the averages to pull data from seperate columns?

In this case I'm interested in the Miami Heat, so I have the averages above but I what i'm trying to do is get the averages from the Heat to the corresponding opposite stat for the knicks (eFG should correlate with opp_eFG of other team and so on)...so basically I am looking for the averages of the following:

Heat eFG and OPP_eFG Knicks

Heat OPP_eFG and eFG Knicks

Heat TOV_PCT and OPP_TOV_PCT Knicks

Heat OPP_TOV_PCT and TOV_PCT Knicks

Heat FTA_RATE and OPP_FTA_RATE Knicks

Heat OPP_FTA_RATE and FTA_RATE Knicks

still looking to get 1 row as the result

Upvotes: 0

Views: 87

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1270401

If you want to calculate the averages first and then average them, you can use two levels of aggregation:

SELECT ROUND(AVG(eFG), 3) as eFG,
       ROUND(AVG(OPP_eFG), 3) as OPP_eFG,
       ROUND(AVG(TOV_PCT), 3) as TOV_PCT,
       ROUND(AVG(OPP_TOV_PCT), 3) as OPP_TOV_PCT,
       ROUND(AVG(ORB_PCT), 3) as ORB_PCT,
       ROUND(AVG(DRB_PCT), 3) as DRB_PCT,
       ROUND(AVG(FTA_RATE), 3) as FTA_RATE,
       ROUND(AVG(OPP_FTA_RATE), 3) as OPP_FTA_RATE
FROM (SELECT bs.team_name, 
             AVG(eFG) as eFG,
             AVG(OPP_eFG) as OPP_eFG,
             AVG(TOV_PCT) as TOV_PCT,
             AVG(OPP_TOV_PCT) as OPP_TOV_PCT,
             AVG(ORB_PCT) as ORB_PCT,
             AVG(DRB_PCT) as DRB_PCT,
             AVG(FTA_RATE) as FTA_RATE,
             AVG(OPP_FTA_RATE) as OPP_FTA_RATE
      FROM box_score bs
      WHERE game_date < '2019-03-07' AND
            ( (team_name = 'Miami Heat' AND WIN_LOSS = 'W') OR
              (team_name = 'New York Knicks' AND WIN_LOSS = 'L')
            )
     ) bs

Upvotes: 1

Megadest
Megadest

Reputation: 634

This answer assumes that you want an AVG(heat)-AVG(knicks), as per original post, not AVG(heatsX OR knicksY)

I'd like to promote Common Table Expressions for this:

WITH selector_heat as (
SELECT
    box_score.team_name, 
    ROUND(AVG(eFG),3) eFG,
    ROUND(AVG(OPP_eFG),3) OPP_eFG,
    ROUND(AVG(TOV_PCT),3) TOV_PCT,
    ROUND(AVG(OPP_TOV_PCT),3) OPP_TOV_PCT,
    ROUND(AVG(ORB_PCT),3) ORB_PCT,
    ROUND(AVG(DRB_PCT),3) DRB_PCT,
    ROUND(AVG(FTA_RATE),3) FTA_RATE,
    ROUND(AVG(OPP_FTA_RATE),3) OPP_FTA_RATE
FROM box_score
WHERE team_name = 'Miami Heat' AND WIN_LOSS = 'W' AND game_date < '2019-03-07' 
)
, selector_knicks as (
...
)
select H.eFG - K.OPP_eFG as magic_nbr
from selector_heat H 
join selector_knicks K ON (1=1)

More details on syntax here: https://www.sqlite.org/lang_with.html but ignore "recursive" bits for now, you don't need them in this instance.

Alternatively (and with a slightly different approach angle) you could use Window clauses to aggregate "per team" and then use the results. More info here: https://www.sqlite.org/windowfunctions.html#introduction_to_window_functions

Example:

SELECT  
  team_name, 
  WIN_LOSS,
  ROUND(AVG(eFG) OVER (partition by team_name, win_loss),3) as eFG
  ...
  from box_score
  where game_date < '2019-03-07'

With this resultset you have your averages for all teams and win_loss combinations. Wrap this in a CTE and join to itself on conditions to suit, e.g.

WITH cte as (SELECT ...)
SELECT H.eFG - K.OPP_eFG as magic_nbr
FROM cte H join cte K 
  ON (H.team_name = 'Miami Heat' 
  AND K.team_name = 'NY Knicks'
  AND H.win_loss = 'W'
  AND K.win_loss = 'L')

Upvotes: 1

GMB
GMB

Reputation: 222582

One solution would be to perform the whole operation in a single table scan (without joins or subqueries), using conditional aggregation:

SELECT  
    box_score.team_name, 
    ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN eFG          END),3) Heat_eFG,
    ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN eFG          END),3) Knicks_eFG,
    ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN OPP_eFG      END),3) Heat_OPP_eFG,
    ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN OPP_eFG      END),3) Knicks_OPP_eFG,
    ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN TOV_PCT      END),3) Heat_TOV_PCT,
    ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN TOV_PCT      END),3) Knicks_TOV_PCT,
    ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN OPP_TOV_PCT  END),3) Heat_OPP_TOV_PCT,
    ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN OPP_TOV_PCT  END),3) Knicks_OPP_TOV_PCT,
    ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN ORB_PCT      END),3) Heat_ORB_PCT,
    ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN ORB_PCT      END),3) Knicks_ORB_PCT,
    ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN DRB_PCT      END),3) Heat_DRB_PCT,
    ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN DRB_PCT      END),3) Knicks_DRB_PCT,
    ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN FTA_RATE     END),3) Heat_FTA_RATE,
    ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN FTA_RATE     END),3) Knicks_FTA_RATE,
    ROUND(AVG(CASE WHEN team_name = 'Miami Heat'      AND WIN_LOSS = 'W' THEN OPP_FTA_RATE END),3) Heat_OPP_FTA_RATE,
    ROUND(AVG(CASE WHEN team_name = 'New York Knicks' AND WIN_LOSS = 'L' THEN OPP_FTA_RATE END),3) Knicks_OPP_FTA_RATE
FROM box_score
WHERE team_name IN ('Miami Heat', 'New York Knicks') AND game_date < '2019-03-07' 

Here is another version of the query if you are looking to compute the average, for example of both eFG for Miami wins AND OPP_eFG on New York losses in a single column. This still relies on conditional aggregation. I also slighly simplified the logic by moving conditions to the WHERE clause.

SELECT  
    box_score.team_name, 
    ROUND(AVG(CASE 
        WHEN team_name = 'Miami Heat'      THEN eFG 
        WHEN team_name = 'New York Knicks' THEN OPP_eFG 
    END, 3) Heats_eFG_Knicks_OPP_eFG, 
    ROUND(AVG(CASE 
        WHEN team_name = 'Miami Heat'      THEN OPP_eFG 
        WHEN team_name = 'New York Knicks' THEN eFG 
    END, 3) Heats_OPP_eFG_Knicks_eFG,
    ROUND(AVG(CASE 
        WHEN team_name = 'Miami Heat'      THEN TOV_PCT 
        WHEN team_name = 'New York Knicks' THEN OPP_TOV_PCT 
    END, 3) Heats_TOV_PCT_Knicks_OPP_TOV_PCT,
    ROUND(AVG(CASE 
        WHEN team_name = 'Miami Heat'      THEN OPP_TOV_PCT 
        WHEN team_name = 'New York Knicks' THEN TOV_PCT 
    END, 3) Heats_OPP_TOV_PCT_Knicks_TOV_PCT,
    ROUND(AVG(CASE 
        WHEN team_name = 'Miami Heat'      THEN FTA_RATE 
        WHEN team_name = 'New York Knicks' THEN OPP_FTA_RATE 
    END, 3) Heats_FTA_RATE_Knicks_OPP_FTA_RATE,
    ROUND(AVG(CASE 
        WHEN team_name = 'Miami Heat'      THEN OPP_FTA_RATE 
        WHEN team_name = 'New York Knicks' THEN FTA_RATE 
    END, 3) Heats_OPP_FTA_RATE_Knicks_FTA_RATE
FROM box_score
WHERE 
    game_date < '2019-03-07' 
    AND (
           ( team_name = 'Miami Heat'      AND win_loss = 'W' )
        OR ( team_name = 'New York Knicks' AND win_loss = 'L') 
    )

NB: as commented by wildpasser, you probably want to use single quotes instead of double quotes around litteral values (this is the SQL standard). I globally all of the double quotes in the original query to single quotes.

Upvotes: 0

Related Questions