Reputation: 2070
I have a very basic group by query in Athena where I would like to use an alias. One can make the example work by putting the same reference in the group by, but that's not really handy when there's complex column modifications going on and logic needs to be copied in two places. Also I did that in the past and now I have a statement that doesn't work by copying over.
Problem:
SELECT
substr(accountDescriptor, 5) as account,
sum(revenue) as grossRevenue
FROM sales
GROUP BY account
This will throw an error:
alias Column 'account' cannot be resolved
The following works, so it's about the alias handling.
SELECT
substr(accountDescriptor, 5) as account,
sum(revenue) as grossRevenue
FROM sales
GROUP BY substr(accountDescriptor, 5)
Upvotes: 11
Views: 13095
Reputation: 2668
In addition to answers from kokosing and Gordon Linoff, you can use numbers that represent the location of the grouped column name in the SELECT
statement. Such approach can also provide you with better performance as described in section 8 of this AWS Blog. For example:
SELECT
substr(accountDescriptor, 5) as account,
sum(revenue) as grossRevenue
FROM sales,
GROUP BY 1
Note: numbering starts from one and not from zero.
Here 1
is somewhat aliased to account
. The main obvious downside is that if you change ordering of you columns within SELECT
than you would also need to account for that within GROUP BY
:
SELECT
sum(revenue) as grossRevenue,
substr(accountDescriptor, 5) as account
FROM sales,
GROUP BY 2
Upvotes: 11
Reputation: 5601
That is because SQL is evaluated in certain order, like table scan, filter, aggregation, projection, sort. You tried to use the result of projection as input of aggregation. In many cases it could be possible (where projection is trivial, like your case), but it such behaviour is not defined in ANSI SQL (which Presto and so Athena follows).
We see that in many cases it is very useful so, support for this might be added in future (extending ANSI SQL).
Currently, there are several ways to overcome this:
SELECT account, sum(revenue) as grossRevenue
FROM (SELECT substr(accountDescriptor, 5) as account, revenue FROM sales)
GROUP BY account
or
WITH better_sales AS (SELECT substr(accountDescriptor, 5) as account, revenue FROM sales)
SELECT account, sum(revenue) as grossRevenue
FROM better_sales
GROUP BY account
or
SELECT account, sum(revenue) as grossRevenue
FROM sales
LATERAL JOIN (SELECT substr(accountDescriptor, 5) as account)
GROUP BY account
or
SELECT substr(accountDescriptor, 5) as account, sum(revenue) as grossRevenue
FROM sales
GROUP BY 1;
Upvotes: 16
Reputation: 1270693
Hive does not allow column aliases in the GROUP BY
-- just as the SQL standard does not allow them. Some databases extend SQL to allow aliases, but this is an extension.
Just repeat the expression:
SELECT substr(accountDescriptor, 5) as account, sum(revenue) as grossRevenue
FROM sales
GROUP BY substr(accountDescriptor, 5);
Upvotes: 0