Greg McFall
Greg McFall

Reputation: 219

BigQuery GroupBy with STRUCT

In BigQuery, I can successfully run the following query using standard SQL:

SELECT 
  COUNT(*) AS totalCount,
  city,
  DATE_TRUNC(timeInterval.intervalStart, YEAR) AS start
FROM 
  sandbox.CountByCity
GROUP BY 
    city, start

But it fails when I nest the start value in a STRUCT, like this...

SELECT 
  COUNT(*) AS totalCount,
  city,
  STRUCT(
    DATE_TRUNC(timeInterval.intervalStart, YEAR) AS start
  ) as timeSpan
FROM 
  sandbox.CountByCity
GROUP BY 
    city, timeSpan.start

In this case, I get the following error message:

Cannot GROUP BY field references from SELECT list alias timeSpan at [10:11]

What is the correct way to write the query so that the start value is nested within a STRUCT?

Upvotes: 11

Views: 16670

Answers (2)

Elliott Brossard
Elliott Brossard

Reputation: 33705

You can do this using ANY_VALUE. The struct value that you get is well-defined, since the value is the same for the entire group:

SELECT 
  COUNT(*) AS totalCount,
  city,
  ANY_VALUE(STRUCT(
    DATE_TRUNC(timeInterval.intervalStart, YEAR) AS start
  )) as timeSpan
FROM 
  sandbox.CountByCity
GROUP BY 
    city, DATE_TRUNC(timeInterval.intervalStart, YEAR);

Here is an example using some sample data:

WITH `sandbox.CountByCity` AS (
  SELECT 'Seattle' AS city, STRUCT(DATE '2017-12-11' AS intervalStart) AS timeInterval UNION ALL
  SELECT 'Seattle', STRUCT(DATE '2016-11-10' AS intervalStart) UNION ALL
  SELECT 'Seattle', STRUCT(DATE '2017-03-24' AS intervalStart) UNION ALL
  SELECT 'Kirkland', STRUCT(DATE '2017-02-01' AS intervalStart)
)
SELECT 
  COUNT(*) AS totalCount,
  city,
  ANY_VALUE(STRUCT(
    DATE_TRUNC(timeInterval.intervalStart, YEAR) AS start
  )) as timeSpan
FROM 
  `sandbox.CountByCity`
GROUP BY 
    city, DATE_TRUNC(timeInterval.intervalStart, YEAR);

You could also consider submitting a feature request to enable GROUP BY with STRUCT types.

Upvotes: 12

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172954

Not sure why exactly you would wanted this - but believe it is for some reason - so try below (at least formally it does what you ask)

#standardSQL
SELECT
  totalCount, 
  city, 
  STRUCT(start) timeSpan
FROM (
  SELECT 
    COUNT(*) AS totalCount,
    city,
    DATE_TRUNC(timeInterval.intervalStart, YEAR) AS start
  FROM `sandbox.CountByCity`
  GROUP BY city, start
)

Upvotes: 1

Related Questions