Reputation: 195
I am wondering if it is possible to order (apply order by) for individual array values in Google BigQuery?
I am able to achieve this by applying order by
on the whole transactonal base table first, then aggregating array; but when table is too large, resource errors appear for ordering by a large table.
So I am wondering if each individual array value can be ordered by using SQL or UDF.
This was asked once Order of data in bigquery repeated records but it was 4.5 years ago.
Upvotes: 14
Views: 28851
Reputation: 1
For a good optimisation in your query you can use this structure
WITH deduped_data AS (
SELECT DISTINCT group_col, col
FROM table
)
SELECT
group_col,
ARRAY_AGG(col ORDER BY col)
FROM deduped_data
GROUP BY group_col;
Upvotes: 0
Reputation: 33745
Sure, you can use the ARRAY function. It supports an optional ORDER BY clause. You haven't provided sample data, but supposing that you have a top level array column named arr
, you can do something like this:
SELECT
col1,
col2,
ARRAY(SELECT x FROM UNNEST(arr) AS x ORDER BY x) AS arr
FROM MyTable;
This sorts the elements of arr
by their values.
If you actually have an array of a struct type, such as ARRAY<STRUCT<a INT64, b STRING>>
, you can sort by one of the struct fields:
SELECT
col1,
col2,
ARRAY(SELECT x FROM UNNEST(arr) AS x ORDER BY a) AS arr
FROM MyTable;
Upvotes: 31
Reputation: 3294
If the array is obtained after aggregation using a group by
clause, the
query can look something like this:
SELECT
ARRAY_AGG(distinct col order by col)
FROM table
GROUP BY group_col
So, no SELECT
is required.
Ref: The accepted answer didn't help. Took help from here - https://count.co/sql-resources/bigquery-standard-sql/array_agg
Upvotes: 4