cam
cam

Reputation: 251

BigQuery giving error of expression not grouped by when order by is included in the sql statement

I am running a simple query in BigQuery.

SELECT
        p.title,
        p.id,
        SUM(view) Views,
        CAST(EXTRACT(DATE from pv.data_tstamp) as string) dateInserted
    FROM  products  p
    LEFT JOIN  productViews  pv
        ON p.sku=pv.sku WHERE
        pv.data_tstamp >= TIMESTAMP('2022-06-20 00:00:00') AND
        pv.data_tstamp  <= TIMESTAMP('2022-06-29 00:00:00') 
        GROUP BY p.title, p.id,
        CAST(EXTRACT(DATE from pv.data_tstamp) as string) order by p.Id

I am getting an error:

SELECT list expression references pv.data_tstamp which is neither grouped nor aggregated at [5:28]

pv.data_stamp is grouped by as can be seen in the syntax.

if i remove order by p.id It works fine, but I need the query to sort the result by Id.

Upvotes: 0

Views: 925

Answers (2)

Jaytiger
Jaytiger

Reputation: 12274

First of all, working queries will be like below:

SELECT p.title,
       p.id,
       SUM(view) Views,
       CAST(EXTRACT(DATE from pv.data_tstamp) as string) dateInserted 
  FROM products  p
  LEFT JOIN  productViews  pv
    ON p.sku=pv.sku WHERE
       pv.data_tstamp >= TIMESTAMP('2022-06-20 00:00:00') AND
       pv.data_tstamp  <= TIMESTAMP('2022-06-29 00:00:00') 
 GROUP BY p.title, p.id, dateInserted -- using a column alias
 ORDER BY p.Id;

or using ordinal positions,

 GROUP BY 1, 2, 4 -- using ordinal positions
 ORDER BY p.Id;

If you remove an alias dateInserted, it will work also along with ORDER BY but a column name becomes anonymous.

SELECT ...
       CAST(EXTRACT(DATE from pv.data_tstamp) as string) -- removed alias
  FROM products  p
  LEFT JOIN  productViews  pv
    ON p.sku=pv.sku WHERE ...
 GROUP BY p.title, p.id, CAST(EXTRACT(DATE FROM pv.data_tstamp) AS STRING)
 ORDER BY p.Id;

What I don't understand is why ORDER BY makes an error as in the original post. My best quess is that by adding ORDER BY, column aliases in SELECT list are freezed and it affects the scope of aliases in GROUP BY list.

Any thought on this ?

Upvotes: 1

Sayan Bhattacharya
Sayan Bhattacharya

Reputation: 1368

From the BQ ORDER BY doc:

Column aliases from a FROM clause or SELECT list are allowed.
If a query contains aliases in the SELECT clause, those aliases override names in the corresponding FROM clause.The data type of expression must be orderable.

In your case you have set dateInserted as the alias name of the functional output but then again in the group by clause you have used the same functional expression instead of the alias name.
This will work when you don't have any order by clause , but as soon as you use order by the query will only recognize the either selected column names or their alias names.

Thus in the Select statement of your query the pv.data_tstamp column name is no longer available as it has been overridden. Only title, id, views, dateInserted (and also the other columns which are not selected)column names are visible.

To remove the error, you need to either use ordinals or alias name in group by and order by clause

Eg:

SELECT
    p.title,
    p.id,
    SUM(view) Views,
    CAST(EXTRACT(DATE from pv.data_tstamp) as string) dateInserted
FROM  products  p
LEFT JOIN  productViews  pv
    ON p.sku=pv.sku WHERE
    pv.data_tstamp >= TIMESTAMP('2022-06-20 00:00:00') AND
    pv.data_tstamp  <= TIMESTAMP('2022-06-29 00:00:00') 
    GROUP BY p.title, p.id,dateInserted  -- replacing with ordinals will also work
    order by p.Id

You can play with this query to check what are the ways you can use both clauses together,

Upvotes: 1

Related Questions