Reputation: 944
I'm trying to learn Hive, especially functions like unix_timestamp
and from_unixtime
.
I have three tables
emp (employee table)
+---+----------------+
| id| name|
+---+----------------+
| 1| James Gordon|
| 2| Harvey Bullock|
| 3| Kristen Kringle|
+---+----------------+
txn (transaction table)
+------+----------+---------+
|acc_id|trans_date|trans_amt|
+------+----------+---------+
| 101| 20180105| 951|
| 102| 20180205| 800|
| 103| 20180131| 100|
| 101| 20180112| 50|
| 102| 20180126| 800|
| 103| 20180203| 500|
+------+----------+---------+
acc (account table)
+---+------+--------+
| id|acc_id|cred_lim|
+---+------+--------+
| 1| 101| 1000|
| 2| 102| 1500|
| 3| 103| 800|
+---+------+--------+
I want to find out the people whose trans_amt
exceeded their cred_lim
in the month of Jan 2018.
The query I'm trying to use is
WITH tabl as
(
SELECT e.id, e.name, a.acc_id, t.trans_amt, a.cred_lim, from_unixtime(unix_timestamp(t.trans_date, 'yyyyMMdd'), 'MMM yyyy') month
FROM emp e JOIN acc a on e.id = a.id JOIN txn t on a.acc_id = t.acc_id
)
SELECT acc_id, sum(trans_amt) total_amt
FROM tabl
GROUP BY tabl.acc_id, tabl.month
WHERE tabl.month = 'Jan 2018' AND tabl.total_amt > cred_lim;
But when I run it, I get an error saying
FAILED: ParseException line 9:2 missing EOF at 'WHERE' near 'month'
This error persists even when I change the where clause to
WHERE tabl.total_amt > cred_lim;
This makes me think the error comes from the GROUP BY
clause but I can't seem to figure this out.
Could someone help me with this?
Upvotes: 1
Views: 13111
Reputation: 31716
Your query has several problems.
WHERE
clause should be used before GROUP BY
There is an extra ')'
after GROUP BY
columns
tabl.total_amt > cred_lim
- This line cannot be used in where
clause because the alias total_amt
cannot be used before it is
nested. Instead, use a HAVING
clause.
I've made these changes in this query and should work for you.
WITH tabl
AS (
SELECT e.id
,e.name
,a.acc_id
,t.trans_amt
,a.cred_lim
,from_unixtime(unix_timestamp(t.trans_date, 'yyyyMMdd'), 'MMM yyyy') month
FROM emp e
INNER JOIN acc a ON e.id = a.id
INNER JOIN txn t ON a.acc_id = t.acc_id
)
SELECT acc_id
,sum(trans_amt) total_amt
FROM tabl
WHERE month = 'Jan 2018'
GROUP BY acc_id
,month
HAVING SUM(trans_amt) > MAX(cred_lim);
Upvotes: 1