Canovice
Canovice

Reputation: 10441

In BigQuery, compute difference between two rows in group by

with
  my_stats as (
    select 24996 as competitionId, 17 as playerId, 'on' as onOff, 8 as fga, 4 as fgm, 0.50 as fgPct union all
    select 24996 as competitionId, 17 as playerId, 'off' as onOff, 5 as fga, 3 as fgm, 0.60 as fgPct union all
    select 24996 as competitionId, 24 as playerId, 'on' as onOff, 9 as fga, 6 as fgm, 0.67 as fgPct union all
    select 24996 as competitionId, 24 as playerId, 'off' as onOff, 3 as fga, 1 as fgm, 0.33 as fgPct union all
    select 24996 as competitionId, 27 as playerId, 'on' as onOff, 5 as fga, 4 as fgm, 0.8 as fgPct
  ),
  
  my_output as (
    select 24996 as competitionId, 17 as playerId, 'diff' as onOff, 3 as fga, 1 as fgm, -0.1 as fgPct union all
    select 24996 as competitionId, 24 as playerId, 'diff' as onOff, 6 as fga, 5 as fgm, 0.34 as fgPct
  )
  

select * from my_stats
select * from my_output

Here is a simple example to demonstrate the problem we are struggling with. We have the table my_stats, where the primary key is a combination of competitionId, playerId, onOff, and where the onOff column can only be 'on' or 'off'. For a single competitionId, playerId then (for which there are two rows, one for 'on', one for 'off'), we would like to subtract the values (on - off) from all other columns.

Hopefully the my_output table makes clear what output we need for this. In the case of playerId = 27, since this player has no 'off' row, they can simply be dropped from the output since there are no calculations to make.

Upvotes: 0

Views: 846

Answers (3)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173046

Below is for BigQuery Standard SQL

#standardSQL
SELECT competitionId, playerId, 'diff' AS onOff,
  SUM(onOffSign * fga) AS fga,
  SUM(onOffSign * fgm) AS fgm,
  SUM(onOffSign * fgPct) AS fgPct  
FROM my_stats, 
  UNNEST([IF(onOff = 'on', 1, -1)]) onOffSign
GROUP BY competitionId, playerId
HAVING COUNT(1) = 2  

Upvotes: 0

dnoeth
dnoeth

Reputation: 60482

Another solution based on a self-join:

select
    t1.competitionId,
    t1.playerId,
    'diff' as onOff,
    t1.fga - t2.fga as fga,
    t1.fgm - t2.fgm as fgm,
    t1.fgpct - t2.fgpct as fgpct
from my_stats as t1
join my_stats as t2
  on t1.competitionId = t2.competitionId
 and t1.playerId = t2.playerId
where t1.onOff = 'on'
  and t2.onOff = 'off'

You should check which apporoach is more efficient

Upvotes: 1

GMB
GMB

Reputation: 222582

You can do conditional aggregation:

select
    competitionId,
    playerId,
    'diff' as onOff,
    sum(case when onOff = 'on' then fga   else - fga   end) fga,
    sum(case when onOff = 'on' then fgm   else - fgm   end) fga,
    sum(case when onOff = 'on' then fgpct else - fgpct end) fgpct
from my_stats
where onOff in ('on', 'off')
group by competitionId, playerId
having count(*) = 2

This groups data by competition and player, then the conditional sum()s compute the difference between "on" and "off" values for each column. The having clause filters out groups that do not have both records available.

Upvotes: 1

Related Questions