Ted
Ted

Reputation: 1062

BigQuery standard SQL: how to group by an ARRAY field

My table has two columns, id and a. Column id contains a number, column a contains an array of strings. I want to count the number of unique id for a given array, equality between arrays being defined as "same size, same string for each index".

When using GROUP BY a, I get Grouping by expressions of type ARRAY is not allowed. I can use something like GROUP BY ARRAY_TO_STRING(a, ","), but then the two arrays ["a,b"] and ["a","b"] are grouped together, and I lose the "real" value of my array (so if I want to use it later in another query, I have to split the string).

The values in this field array come from the user, so I can't assume that some character is simply never going to be there (and use it as a separator).

Upvotes: 16

Views: 44539

Answers (2)

user13996883
user13996883

Reputation: 11

Alternatively, you can use another separator than comma

ARRAY_TO_STRING(a,"|")

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172994

Instead of GROUP BY ARRAY_TO_STRING(a, ",") use GROUP BY TO_JSON_STRING(a)

so your query will look like below

#standardsql
SELECT 
  TO_JSON_STRING(a) arr,
  COUNT(DISTINCT id) cnt
FROM `project.dataset.table`
GROUP BY arr

You can test it with dummy data like below

#standardsql
WITH `project.dataset.table` AS (
  SELECT 1 id, ["a,b", "c"] a UNION ALL
  SELECT 1, ["a","b,c"]
)
SELECT 
  TO_JSON_STRING(a) arr,
  COUNT(DISTINCT id) cnt
FROM `project.dataset.table`
GROUP BY arr  

with result as

Row     arr             cnt  
1       ["a,b","c"]     1    
2       ["a","b,c"]     1    

Update based on @Ted's comment

#standardsql
SELECT 
  ANY_VALUE(a) a,
  COUNT(DISTINCT id) cnt
FROM `project.dataset.table`
GROUP BY TO_JSON_STRING(a)

Upvotes: 28

Related Questions