supernova
supernova

Reputation: 2070

AWS Athena ALIAS in Group By does not get resolved

I have a very basic group by query in Athena where I would like to use an alias. One can make the example work by putting the same reference in the group by, but that's not really handy when there's complex column modifications going on and logic needs to be copied in two places. Also I did that in the past and now I have a statement that doesn't work by copying over.

Problem:

SELECT 
    substr(accountDescriptor, 5) as account, 
    sum(revenue) as grossRevenue 
FROM sales 
GROUP BY account

This will throw an error:

alias Column 'account' cannot be resolved

The following works, so it's about the alias handling.

SELECT 
    substr(accountDescriptor, 5) as account, 
    sum(revenue) as grossRevenue 
FROM sales 
GROUP BY substr(accountDescriptor, 5)

Upvotes: 11

Views: 13095

Answers (3)

Ilya Kisil
Ilya Kisil

Reputation: 2668

In addition to answers from kokosing and Gordon Linoff, you can use numbers that represent the location of the grouped column name in the SELECT statement. Such approach can also provide you with better performance as described in section 8 of this AWS Blog. For example:

SELECT
    substr(accountDescriptor, 5) as account,
    sum(revenue) as grossRevenue
FROM sales,
GROUP BY 1

Note: numbering starts from one and not from zero.

Here 1 is somewhat aliased to account. The main obvious downside is that if you change ordering of you columns within SELECT than you would also need to account for that within GROUP BY:

SELECT
    sum(revenue) as grossRevenue,
    substr(accountDescriptor, 5) as account
FROM sales,
GROUP BY 2

Upvotes: 11

kokosing
kokosing

Reputation: 5601

That is because SQL is evaluated in certain order, like table scan, filter, aggregation, projection, sort. You tried to use the result of projection as input of aggregation. In many cases it could be possible (where projection is trivial, like your case), but it such behaviour is not defined in ANSI SQL (which Presto and so Athena follows).

We see that in many cases it is very useful so, support for this might be added in future (extending ANSI SQL).

Currently, there are several ways to overcome this:

SELECT account, sum(revenue) as grossRevenue 
FROM (SELECT substr(accountDescriptor, 5) as account, revenue FROM sales)
GROUP BY account

or

WITH better_sales AS (SELECT substr(accountDescriptor, 5) as account, revenue FROM sales)
SELECT account, sum(revenue) as grossRevenue 
FROM better_sales
GROUP BY account

or

SELECT account, sum(revenue) as grossRevenue 
FROM sales
LATERAL JOIN (SELECT substr(accountDescriptor, 5) as account)
GROUP BY account

or

SELECT substr(accountDescriptor, 5) as account, sum(revenue) as grossRevenue
FROM sales
GROUP BY 1;

Upvotes: 16

Gordon Linoff
Gordon Linoff

Reputation: 1270693

Hive does not allow column aliases in the GROUP BY -- just as the SQL standard does not allow them. Some databases extend SQL to allow aliases, but this is an extension.

Just repeat the expression:

SELECT substr(accountDescriptor, 5) as account, sum(revenue) as grossRevenue
FROM sales
GROUP BY substr(accountDescriptor, 5);

Upvotes: 0

Related Questions