Reputation: 23

Bigquery moving average

Can anybody tell me how to calculate moving average in bigquery.

Here is what I need in mysql style.

SELECT T1.id, T1.value_column1, avg(T2.value_column1)
FROM table1 T1
INNER JOIN table1 T2 ON T2.Id BETWEEN T1.Id-19 AND T1.Id

Upvotes: 2

Answers (3)

Felipe Hoffa

Reputation: 59325

For an updated and more efficient answer, https://stackoverflow.com/a/24943950/132438.

Check the new LAG() and LEAD() window functions. They allow you to traverse the result set, without the need for a self join.

https://developers.google.com/bigquery/docs/query-reference#windowfunctions

A different option with JOIN EACH (this can get too slow as an extremely large amount of data can get generated in the intermediate steps):

SELECT a.SensorId SensorId, a.Timestamp, AVG(b.Data) AS avg_prev_hour_load
FROM (
  SELECT * FROM [io_sensor_data.moscone_io13]
  WHERE SensorId = 'XBee_40670EB0/mic') a
JOIN EACH [io_sensor_data.moscone_io13] b
ON a.SensorId = b.SensorId
WHERE b.Timestamp BETWEEN (a.Timestamp - 3600000) AND a.Timestamp
GROUP BY SensorId, a.Timestamp;

(based on Joe Celko's SQL problems)

Upvotes: 2

N.N.

Reputation: 3172

I have created the following "Times" table:

Table Details: Dim_Periods
Schema
Date    TIMESTAMP   
Year    INTEGER         
Month   INTEGER         
day         INTEGER         
QUARTER INTEGER     
DAYOFWEEK   INTEGER     
MonthStart  TIMESTAMP   
MonthEnd    TIMESTAMP   
WeekStart   TIMESTAMP   
WeekEnd TIMESTAMP   
Back30Days  TIMESTAMP   -- the date 30 days before "Date"
Back7Days   TIMESTAMP   -- the date 7 days before "Date"

and I use such query to handle "running sums"

SELECT Date,Count(*) as MovingCNT
FROM

(SELECT Date,
                Back7Days 
                    FROM DWH.Dim_Periods  
                 where Date < timestamp(current_date()) AND
                             Date >= (DATE_ADD (CURRENT_TIMESTAMP(), -5, 'month'))
                )P
                CROSS JOIN EACH
    (SELECT repository_url,repository_created_at
    FROM publicdata:samples.github_timeline
                ) L
        WHERE timestamp(repository_created_at)>= Back7Days 
              AND timestamp(repository_created_at)<= Date

GROUP EACH BY Date

Note that it can be used for "Month to date", Week to Date" "30 days back" etc. aggregations as well. However, performance is not the best and the query can take a while on larger data sets due to the Cartesian join. Hope this helps

Upvotes: 0

Jordan Tigani

Reputation: 26637

You can do the same thing, but since BigQuery only allows join on equality, it takes a little bit more work. Here is an example that will compute the 6 month moving average of birth-weights from the public natality sample.

SELECT 
  --Convert months-since-year-0 back to year, month
  INTEGER(month / 12) as year, 
  month % 12 as month, 
  avg 
FROM (
  SELECT month, 
    -- Note that this average is the average over all of the data in the 
    -- last 6 months, not an average over the avg values for the last 6 months.
    -- It is easy to compute the latter, if that is what is desired -- just 
    -- compute the average in the inner select, and take the average of those
    -- here.
     SUM(total_weight_per_month) / SUM(records_per_month) as avg
  FROM (
    SELECT 
       -- Note we use t2.month here since that is what is compared against
       -- 6 different t1 months.
       t2.month as month, 
       t1.records_per_month as records_per_month,     
       t1.total_weight_per_month as total_weight_per_month 
    FROM (
      SELECT month,
        COUNT( weight_pounds ) as records_per_month, 
        SUM( weight_pounds)  as total_weight_per_month,
        -- This active field is the key that lets us join all of the
        -- values against the values in the date subselect.
        1 AS active
      FROM (
        SELECT 
           -- Convert year and month fields to a single value that
           -- has the number of months since year 0. This will allow
           -- us to do math on the dates.
          year * 12 + month AS month, 
          weight_pounds 
        FROM [publicdata:samples.natality] 
        WHERE weight_pounds > 0)
      GROUP BY month) as t1
    JOIN  
      -- We join the weights per month agsint a subselect that contains
      -- all months.
      (SELECT month, 1 as active 
      FROM
        (SELECT 
           year * 12 + month AS month, 
        FROM [publicdata:samples.natality])
      GROUP BY month) as t2
    ON t1.active = t2.active 
    -- Here is where we get the moving average -- we basically take the month 
    -- value from t1 and make it apply for 6 months.
    WHERE t1.month >= t2.month && t1.month - 6 < t2.month)
  GROUP BY month
  ORDER BY month desc)

Upvotes: 4

Bigquery moving average

Answers (3)

Related Questions