Reputation: 73
My job has tasked me with summarizing values of multiple arrays and I've reached a gap in my knowledge. The insight and assistance from this group is greatly appreciated.
The challenge:
I have an array of domain TLDs in each row within a single column BigQuery table. I would like to group by each TLD and return a total count by each TLD as a new table.
["biz","us","international","eu","com","co","world","us","international","eu","co","biz"]
["com","co","world"]
Response
**TLD_Name**
biz 2
us 2
international 2
eu 2
com 2
co 3
world 1
Thanks in advance for the help.
Upvotes: 0
Views: 185
Reputation: 172993
In case if tld values in each row are highly repeatable and you have really large number of rows - below might provide a little optimization by first combining/aggregating tld counts inside each row and then summarizing on whole table level (for BigQuery Standard SQL)
#standardSQL
WITH `yourproject.yourdataset.yourtable` AS (
SELECT ["biz","us","international","eu","com","co","world","us","international","eu","co","biz"] tlds UNION ALL
SELECT ["com","co","world","biz"]
)
SELECT
tld_count.tld AS tld,
SUM(tld_count.cnt) AS cnt
FROM `yourproject.yourdataset.yourtable`,
UNNEST(ARRAY(SELECT AS STRUCT tld, COUNT(*) AS cnt FROM UNNEST(tlds) AS tld GROUP BY tld)) AS tld_count
GROUP BY tld
Upvotes: 1
Reputation: 33745
Supposing that the array column is named tlds
, you can run the following standard SQL query:
SELECT
tld AS TLD_Name,
COUNT(*) AS count
FROM YourTable
CROSS JOIN UNNEST(tlds) AS tld
GROUP BY tld;
This has the effect of "flattening" the array and getting a count associated with each TLD.
Upvotes: 2