Amber
Amber

Reputation: 944

Hive- Error : missing EOF at 'WHERE'

I'm trying to learn Hive, especially functions like unix_timestamp and from_unixtime.

I have three tables

emp (employee table)
+---+----------------+
| id|            name|
+---+----------------+
|  1|    James Gordon|
|  2|  Harvey Bullock|
|  3| Kristen Kringle|
+---+----------------+

txn (transaction table)
+------+----------+---------+
|acc_id|trans_date|trans_amt|
+------+----------+---------+
|   101|  20180105|      951|
|   102|  20180205|      800|
|   103|  20180131|      100|
|   101|  20180112|       50|
|   102|  20180126|      800|
|   103|  20180203|      500|
+------+----------+---------+

acc (account table)
+---+------+--------+
| id|acc_id|cred_lim|
+---+------+--------+
|  1|   101|    1000|
|  2|   102|    1500|
|  3|   103|     800|
+---+------+--------+

I want to find out the people whose trans_amt exceeded their cred_lim in the month of Jan 2018.

The query I'm trying to use is

WITH tabl as
(
SELECT  e.id, e.name, a.acc_id, t.trans_amt, a.cred_lim, from_unixtime(unix_timestamp(t.trans_date, 'yyyyMMdd'), 'MMM yyyy') month
FROM emp e JOIN acc a on e.id = a.id JOIN txn t on a.acc_id = t.acc_id
)
SELECT acc_id, sum(trans_amt) total_amt
FROM tabl
GROUP BY tabl.acc_id, tabl.month
WHERE tabl.month = 'Jan 2018' AND tabl.total_amt > cred_lim;

But when I run it, I get an error saying

FAILED: ParseException line 9:2 missing EOF at 'WHERE' near 'month'

This error persists even when I change the where clause to

WHERE tabl.total_amt > cred_lim;

This makes me think the error comes from the GROUP BY clause but I can't seem to figure this out.

Could someone help me with this?

Upvotes: 1

Views: 13111

Answers (1)

Kaushik Nayak
Kaushik Nayak

Reputation: 31716

Your query has several problems.

WHERE clause should be used before GROUP BY

There is an extra ')' after GROUP BY columns

tabl.total_amt > cred_lim - This line cannot be used in where clause because the alias total_amt cannot be used before it is nested. Instead, use a HAVING clause.

I've made these changes in this query and should work for you.

  WITH tabl
    AS (
        SELECT e.id
            ,e.name
            ,a.acc_id
            ,t.trans_amt
            ,a.cred_lim
            ,from_unixtime(unix_timestamp(t.trans_date, 'yyyyMMdd'), 'MMM yyyy') month
        FROM emp e
        INNER JOIN acc a ON e.id = a.id
        INNER JOIN txn t ON a.acc_id = t.acc_id
        )
    SELECT acc_id
        ,sum(trans_amt) total_amt
    FROM tabl
    WHERE month = 'Jan 2018'
    GROUP BY acc_id
        ,month
    HAVING SUM(trans_amt) > MAX(cred_lim);

Upvotes: 1

Related Questions