Matthew Barrowclough
Matthew Barrowclough

Reputation: 73

Summarize array values from column

My job has tasked me with summarizing values of multiple arrays and I've reached a gap in my knowledge. The insight and assistance from this group is greatly appreciated.

The challenge:

I have an array of domain TLDs in each row within a single column BigQuery table. I would like to group by each TLD and return a total count by each TLD as a new table.

["biz","us","international","eu","com","co","world","us","international","eu","co","biz"]
["com","co","world"]        

Response

**TLD_Name**
biz 2
us 2
international 2
eu 2
com 2
co 3
world 1

Thanks in advance for the help.

Upvotes: 0

Views: 185

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

In case if tld values in each row are highly repeatable and you have really large number of rows - below might provide a little optimization by first combining/aggregating tld counts inside each row and then summarizing on whole table level (for BigQuery Standard SQL)

#standardSQL
WITH `yourproject.yourdataset.yourtable` AS (
  SELECT ["biz","us","international","eu","com","co","world","us","international","eu","co","biz"] tlds UNION ALL
  SELECT ["com","co","world","biz"]   
)
SELECT
  tld_count.tld AS tld,
  SUM(tld_count.cnt) AS cnt
FROM `yourproject.yourdataset.yourtable`,
UNNEST(ARRAY(SELECT AS STRUCT tld, COUNT(*) AS cnt FROM UNNEST(tlds) AS tld GROUP BY tld)) AS tld_count
GROUP BY tld   

Upvotes: 1

Elliott Brossard
Elliott Brossard

Reputation: 33745

Supposing that the array column is named tlds, you can run the following standard SQL query:

SELECT
  tld AS TLD_Name,
  COUNT(*) AS count
FROM YourTable
CROSS JOIN UNNEST(tlds) AS tld
GROUP BY tld;

This has the effect of "flattening" the array and getting a count associated with each TLD.

Upvotes: 2

Related Questions