Ipek
Ipek

Reputation: 23

how to calculate weighted average in hql?

I am new in HQL and I have a table where:

  1. column A corresponds to merchant name,
  2. column B >> merchant ID
  3. column C >> promised shipment (in business days) [named shipmentday]
  4. column D >> nr of SKUs that merchant has on shipmentday basis [named listing_count]
  5. column E >> corresponds to the total number of SKUs of that merchant [named total_listingcount]

enter image description here

Is there an easy query to calculate weighted shipmentday average per merchant with HiveQL like I did on Excel? (Since shipmentdays may vary, not every merchant has the same number of rows)

Upvotes: 0

Views: 478

Answers (1)

Gusti Adli
Gusti Adli

Reputation: 1213

You need to use window function of sum() over(order by time rows between preceding 4 and current row).

SELECT
    merchantName,
    merchantID,
    shipmentday,
    listing_count,
    sum(shipmentday * listing_count) over(order by period rows between preceding 4 and current row) / sum(listing_count) over(order by period rows between preceding 4 and current row) as weighed_ma
FROM (
    SELECT *, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS period
    FROM your_table
);

Note: You can ignore the subquery if you have the time column in your data. I'm generating period because you don't have time column in your sample.

If window functions throws an error, then I'm assuming your Hive version doesn't support the required windows function. Alternatively you can use a CROSS JOIN. CROSS JOIN maps all row from the origin table to all rows in the target table (think of it as an left join without any key matching).

WITH a AS (
    SELECT *, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS period
    FROM your_table
)

SELECT
    merchantName,
    merchantID,
    shipmentday,
    listing_count,
    sum(shipmentday_b * listing_count_b) / sum(listing_count_b) AS weighted_ma
FROM (
    SELECT 
        a.*, 
        b.shipmentday AS shipmentday_b, 
        b.listing_count AS listing_count_b
    FROM a
    CROSS JOIN a AS b
    WHERE b.rn BETWEEN a.rn - 4 and a.rn
) a
GROUP BY merchantName, merchantID, shipmentday,listing_count;

Upvotes: 1

Related Questions