Reputation: 251
I am running a simple query in BigQuery.
SELECT
p.title,
p.id,
SUM(view) Views,
CAST(EXTRACT(DATE from pv.data_tstamp) as string) dateInserted
FROM products p
LEFT JOIN productViews pv
ON p.sku=pv.sku WHERE
pv.data_tstamp >= TIMESTAMP('2022-06-20 00:00:00') AND
pv.data_tstamp <= TIMESTAMP('2022-06-29 00:00:00')
GROUP BY p.title, p.id,
CAST(EXTRACT(DATE from pv.data_tstamp) as string) order by p.Id
I am getting an error:
SELECT list expression references pv.data_tstamp which is neither grouped nor aggregated at [5:28]
pv.data_stamp is grouped by as can be seen in the syntax.
if i remove order by p.id
It works fine, but I need the query to sort the result by Id.
Upvotes: 0
Views: 925
Reputation: 12274
First of all, working queries will be like below:
SELECT p.title,
p.id,
SUM(view) Views,
CAST(EXTRACT(DATE from pv.data_tstamp) as string) dateInserted
FROM products p
LEFT JOIN productViews pv
ON p.sku=pv.sku WHERE
pv.data_tstamp >= TIMESTAMP('2022-06-20 00:00:00') AND
pv.data_tstamp <= TIMESTAMP('2022-06-29 00:00:00')
GROUP BY p.title, p.id, dateInserted -- using a column alias
ORDER BY p.Id;
or using ordinal positions,
GROUP BY 1, 2, 4 -- using ordinal positions
ORDER BY p.Id;
If you remove an alias dateInserted, it will work also along with ORDER BY but a column name becomes anonymous.
SELECT ...
CAST(EXTRACT(DATE from pv.data_tstamp) as string) -- removed alias
FROM products p
LEFT JOIN productViews pv
ON p.sku=pv.sku WHERE ...
GROUP BY p.title, p.id, CAST(EXTRACT(DATE FROM pv.data_tstamp) AS STRING)
ORDER BY p.Id;
What I don't understand is why ORDER BY makes an error as in the original post. My best quess is that by adding ORDER BY, column aliases in SELECT list are freezed and it affects the scope of aliases in GROUP BY list.
Any thought on this ?
Upvotes: 1
Reputation: 1368
From the BQ ORDER BY doc:
Column aliases from a FROM clause or SELECT list are allowed.
If a query contains aliases in the SELECT clause, those aliases override names in the corresponding FROM clause.The data type of expression must be orderable.
In your case you have set dateInserted as the alias name of the functional output but then again in the group by
clause you have used the same functional expression instead of the alias name.
This will work when you don't have any order by
clause , but as soon as you use order by
the query will only recognize the either selected column names or their alias names.
Thus in the Select
statement of your query the pv.data_tstamp column name is no longer available as it has been overridden. Only title, id, views, dateInserted (and also the other columns which are not selected)column names are visible.
To remove the error, you need to either use ordinals
or alias name
in group by and order by clause
Eg:
SELECT
p.title,
p.id,
SUM(view) Views,
CAST(EXTRACT(DATE from pv.data_tstamp) as string) dateInserted
FROM products p
LEFT JOIN productViews pv
ON p.sku=pv.sku WHERE
pv.data_tstamp >= TIMESTAMP('2022-06-20 00:00:00') AND
pv.data_tstamp <= TIMESTAMP('2022-06-29 00:00:00')
GROUP BY p.title, p.id,dateInserted -- replacing with ordinals will also work
order by p.Id
You can play with this query to check what are the ways you can use both clauses together,
Upvotes: 1