xxx222
xxx222

Reputation: 3244

How to do weighted moving average with a weight column in Hive/SQL?

I have a column weights and a column recall, I want to calculate the weighted average of recall over last 7 days based on weights, I already have the query that calculate the moving average with OVER statement, however, for calculating the weighted moving average, I think I will have to replace AVG the aggregation function to some other functions that takes two columns like WEIGHTED_AVG(recall, weights), is there any way I can do that? Here is my current query.

SELECT 
weights,
recall,
avg(recall) OVER(ROWS 6 PRECEDING) AS recall_ma_week FROM TableA;

Example TableA would be:

recall weight
 0.5      3
 0.2      2
 0.3      4
 0.4      2
 0.3      6
 0.2      8

And for the last row for example, the number I want to calculate should be 0.2*8 + 0.3*6 + ... / (8 + 6 + ...), in formula it is, sum(weight * recall) / sum(weight), and when sum(weight) == 0, return that value as 1.0. If I were able to create a customized aggregation function that would be awesome but I don't know how to do it in pure SQL.

Upvotes: 0

Views: 1003

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269643

Just use arithmetic:

SELECT (sum(recall * weight) OVER (ORDER BY report_created_at ASC ROWS 6 PRECEDING) /
        (case when sum(weight) OVER (ORDER BY report_created_at ASC ROWS 6 PRECEDING), 0) <> 0
              then sum(weight) OVER (ORDER BY report_created_at ASC ROWS 6 PRECEDING), 0)
         end)
       ) AS recall_ma_week

Upvotes: 1

Related Questions