J-t
J-t

Reputation: 9

Bigquery using current row's value across window functions

I would like to calculate the average and STDDEV for the normailzed time of a race. I first want to calculate the pace of each racer by doing finish_time / distance. Next, I want to do an average normalized to the current rows distance.

So for example to calculate the average for row 5 not including row 5: AVG(Pace * current_distance) = (Pace_1 * distance_5 + Pace_2 * distance_5 + Pace_3 * distance_5 + Pace_4 * distance_5 ) / 4

Pace_1 = finish_time_1 / distance_1

This is how I have tried to do it two different ways but it is not working:

Upvotes: 0

Views: 131

Answers (1)

Samuel
Samuel

Reputation: 3528

The task is to calculate the expected finish time for a certain sport racing event from the previous pace times.

Let's start with some random data in CTE sample. Then in each tbl addition columns are calculated. Please use in the window function partition by player to differ between the player. win1 is for all races by a player, win2 is only for the previous ones. Since the task is to forecast the finish time, we are summing over all finish times and substract the one from the current race. Then division by the racing counts minus one for the current one, gives the average. For STDDEV use the formula \sqrt{sum over i (x-average)^2 / N}

With sample as (
  Select "racer1" as player,offset as date,  finish_time, (offset+1)*1000 distance
  from 
  unnest([100,200,330,400,500]) finish_time with offset
),
tbl1 as (
  Select *,
  finish_time / distance as Pace
  from sample
  window win1 as (partition by player)
),
tbl2 as (
  Select *,
  avg(pace * distance) over win1 as avg_normalized_finishing,
  avg(pace * distance) over win2 as avg_normalized_finishing_previous_ones,
  sum(pace * distance) over win1 - pace * distance as sum_finishing_times,
  (sum(pace * distance) over win1 - pace * distance) / (count(pace * distance) over win1 -1) as avg_finishing_times,
  count(pace * distance) over win1 as races_done
  , 
  from tbl1
  window win1 as (partition by player), 
    win2 as (partition by player ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) 
),
tbl3 as (
Select * ,
SQRT((sum( pow(pace * distance- avg_finishing_times,2) )  over win1 - pow(pace * distance- avg_finishing_times,2))/ (races_done-1)) as STDDEV 
from tbl2
window win1 as (partition by player), 
    win2 as (partition by player ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) 
)

Select * from tbl3

Upvotes: 0

Related Questions