DataWrangler1980
DataWrangler1980

Reputation: 553

BigQuery SQL, Obtain Median, Grouped by Date?

When trying to obtain, say the median using the partition by window function, I receive an error message "SELECT list expression references column seller_stock which is neither grouped nor aggregated", why is this, how must i write this SQL differently? I have many records per day, and i want to return the median for each day ...

SELECT date(snapshot_date) AS period, 
  PERCENTILE_DISC(**seller_stock**, 0.5) OVER (PARTITION BY snapshot_date) AS median_stock
FROM `table.name`  
WHERE snapshot_date >= "2022-04-01" 
GROUP BY snapshot_date

Upvotes: 5

Views: 13740

Answers (2)

Matthew Cox
Matthew Cox

Reputation: 1194

One way to accomplish this with an aggregate function which would avoid requiring a subquery is by using the APPROX_QUANTILES function.

SELECT date(snapshot_date) AS period, 
  APPROX_QUANTILES(seller_stock, 100)[OFFSET(50)] AS median_stock
FROM `table.name`  
WHERE snapshot_date >= "2022-04-01" 
GROUP BY snapshot_date

This will pick 50th quantile value.

Upvotes: 7

Chaotic Pechan
Chaotic Pechan

Reputation: 966

The thing is that you cannot group by an AGG function, since you are getting already the median there over by your rows, you will need just the top row of that statement.

You can use an intermediate table or aux.

This is an example:

with median_data as (
  select 
    date(snapshot_date) AS period,
    PERCENTILE_DISC(seller_stock, 0.5) OVER (PARTITION BY snapshot_date) AS median_stock,
    row_number() over(order by snapshot_date) as r
  from `table.name`
  where snapshot_date >= "2022-04-01" 
)

select period,median_stock from median_data where r =  1

Upvotes: 8

Related Questions