Reputation: 674
I have a transaction table t1
in Hive that looks like this:
store_id cust_id zip_code transaction_count spend
1000 100 123 3 50
2000 200 456 2 20
I'm trying to compute some metrics using this table as follows:
SELECT
store_id,
zip_code,
SUM(transaction_count) OVER (PARTITION BY store_id, zip_code) / COUNT(*) OVER(PARTITION BY store_id, zip_code) AS avg_transactions,
SUM(spend) OVER(PARTITION BY store_id, zip_code)/SUM(transaction_count) OVER(PARTITION BY store_id, zip_code) AS avg_purchase
FROM
t1
GROUP BY
1, 2
Hive is throwing an error
SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Expression not in GROUP BY key 'spend'
But the column spend
is being used with an aggregation function (sum), so I cannot group by this column. How can I fix this error?
Upvotes: 0
Views: 1598
Reputation: 1269753
Hmmm . . . I think you want:
SELECT store_id, zip_code, AVG(transaction_count) as avg_transactions,
SUM(spend) / SUM(transaction_count) as avg_purchase
FROM t1
GROUP BY 1, 2;
I don't see why you are using window functions.
Upvotes: 1
Reputation: 222462
Window functions do not need group by
. You either want:
SELECT
store_id,
zip_code,
AVG(transaction_count) OVER(PARTITION BY store_id, zip_code) AS avg_transactions,
SUM(spend) OVER(PARTITION BY store_id, zip_code)
/ SUM(transaction_count) OVER(PARTITION BY store_id, zip_code) AS avg_purchase
FROM t1
Note that the first expression can be shortened with AVG()
.
This brings as many rows as in the original table. If, on the other hand, you want just one row per store and zip code, then use regular aggregate functions with GROUP BY
:
SELECT
store_id,
zip_code,
AVG(transaction_count) AS avg_transactions,
SUM(spend) / SUM(transaction_count) AS avg_purchase
FROM t1
GROUP BY store_id, zip_code
Upvotes: 3