N.N.
N.N.

Reputation: 3172

Multiple subtotals - Rollup order of fields

I am trying to run a query that aggregates data, groups the results by several different fields, and extract all relevant "SubTotal" permutations. (similar to CUBE() in MSSQL)

When Using Group By Rollup(), I get only permutations according to the order of the Group By fields in the Rollup function. For example the query below (runs on a public dataset), it returns subtotal by year, or by year and month, or by year, month and medallion... but it doesn't subtotal by medallion.

SELECT
  trip_year,
  trip_month,
  medallion,
  SUM(trip_count) AS Sum_trip_count
FROM
  [nyc-tlc:yellow.Trips_ByMonth_ByMedallion]
WHERE
  medallion IN ("2R76",    "8J82",    "3B85",    "4L79",    "5D59",    "6H75",    "7P60",    "8V48",    "1H12",    "2C69",    "2F38",    "5Y86",    "5j90",    "8A75",    "8V41",    "9J24",    "9J55",    "1E13",    "1J82")
GROUP BY
  ROLLUP(trip_year,
    trip_month,
    medallion)

My question is: What should I do in order to get all different permutations of "Sub Totals" in a single query results.

Already tried: Union with similar query but with different order, it works, but not elegant (it would require too many unions). Thanks

Upvotes: 2

Views: 1773

Answers (1)

Nicholas
Nicholas

Reputation: 1714

You are correct on both counts. In BigQuery, ROLLUP respects the hierarchy treating the listed fields as a strictly ordered list. Their order will not be changed during aggregation.

The CUBE aggregate commonly found in other SQL environments is unordered and in fact aggregates every possible order/subset of its listed fields. At this time, CUBE has not been implemented in BigQuery. The workaround you suggest is also what I would suggest. UNION all result sets from ROLLUP using each permutation of its contained fields. Albeit not ideal, you should get the same results.

In short, UNIONs of several queries with different permutations of ROLLUP fields is the only way to achieve this at the moment. The downsides are as you state that this may be difficult to maintain and can be more expensive in queries.

If you would like to see CUBE implemented in BigQuery, I strongly encourage you to file a feature request on the Big Query public issue tracker. Be sure to include a thorough use case in this request.

UPDATE: To support the feature request filed by the OP, please star it and you'll receive notifications with updates.

Upvotes: 1

Related Questions