Reputation: 343
I try to write Hive Sql like that
SELECT count(1), substr(date, 1, 4) as year
FROM ***
GROUP BY year
But Hive cannot recognize the alias name 'year', it complains that: FAILED: SemanticException [Error 10004]: Line 1:79 Invalid table alias or column reference 'year'
One solution(Hive: SELECT AS and GROUP BY) suggest to use 'GROUP BY substr(date, 1, 4)'.
It works! However in some cases the value I want to group by may be generated from multiple lines of hive function code, it's very ugly to write code like
SELECT count(1), func1(func2(..........................)) AS something
FROM ***
GROUP BY func1(func2(..........................))
Is there any clean way in Hive to do that? Any suggestions?
Upvotes: 13
Views: 23424
Reputation: 690
Specifying the position in Group By will solve your issue. This position number in Group By works even when SET hive.groupby.orderby.position.alias=true; (Hive 0.12)
SELECT count(1), substr(date, 1, 4) as year
FROM ***
GROUP BY 2;
Upvotes: 25
Reputation: 1093
In Hive 0.11.0 and later, columns can be specified by position if hive.groupby.orderby.position.alias is set to true (the default is false).
So setting set hive.groupby.orderby.position.alias=true;
in your .hql (or .hiverc for a permanent solution) will do the trick and then you can type group by 2
for the above example.
Source: hive language manual
Upvotes: 9
Reputation: 4391
One solution that comes to mind is put GROUP BY
to outer query:
SELECT count(*) , year FROM
(
SELECT substr(date, 1, 4) as year FORM ***
) inner
GROUP BY year
GL!
Upvotes: 0