Reputation: 23
I am new in HQL and I have a table where:
Is there an easy query to calculate weighted shipmentday average per merchant with HiveQL like I did on Excel? (Since shipmentdays may vary, not every merchant has the same number of rows)
Upvotes: 0
Views: 478
Reputation: 1213
You need to use window function of sum() over(order by time rows between preceding 4 and current row)
.
SELECT
merchantName,
merchantID,
shipmentday,
listing_count,
sum(shipmentday * listing_count) over(order by period rows between preceding 4 and current row) / sum(listing_count) over(order by period rows between preceding 4 and current row) as weighed_ma
FROM (
SELECT *, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS period
FROM your_table
);
Note: You can ignore the subquery if you have the time column in your data. I'm generating period
because you don't have time column in your sample.
If window functions throws an error, then I'm assuming your Hive version doesn't support the required windows function. Alternatively you can use a CROSS JOIN
. CROSS JOIN
maps all row from the origin table to all rows in the target table (think of it as an left join without any key matching).
WITH a AS (
SELECT *, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS period
FROM your_table
)
SELECT
merchantName,
merchantID,
shipmentday,
listing_count,
sum(shipmentday_b * listing_count_b) / sum(listing_count_b) AS weighted_ma
FROM (
SELECT
a.*,
b.shipmentday AS shipmentday_b,
b.listing_count AS listing_count_b
FROM a
CROSS JOIN a AS b
WHERE b.rn BETWEEN a.rn - 4 and a.rn
) a
GROUP BY merchantName, merchantID, shipmentday,listing_count;
Upvotes: 1