hnandarusdy
hnandarusdy

Reputation: 452

BigQuery to aggregate with array

Given 2 tables below

WITH id_tags AS
(
  SELECT 1 as ID, ['Michael','New York'] as tags UNION ALL
  SELECT 2 as ID, ['Michael','Jon', 'Texas'] as tags UNION ALL
  SELECT 3 as ID, ['abcd','Washington'] as tags UNION ALL
  SELECT 4 as ID, ['Washington','New York','Michael'] as tags UNION ALL
  SELECT 5 as ID, ['America','Michael'] as tags UNION ALL
  SELECT 6 as ID, ['Washington','Michael', 'defg'] as tags UNION ALL
  SELECT 7 as ID, ['America','Burqq','defg'] as tags
),
tagsCategory AS(
  SELECT 'Michael' as tags, 'Person' as category UNION ALL
  SELECT 'Burqq' as tags, 'Person' as category  UNION ALL
  SELECT 'New York' as tags, 'City' as category UNION ALL
  SELECT 'Washington' as tags, 'City' as category UNION ALL
  SELECT 'Texas' as tags, 'City' as category
)

I want to display an exception list. The exception list is when id_tags has 0 or more than 1 person and when id_tags has 0 or more than 1 city. (basically, it is only 1 per category)

Expected results:

----------------------------------
ID |  Reason                      
----------------------------------
2  |    Person more than 1        
3  |    Only 1 city  
4  |    More than 1 city detected    
5  |    No City Detected 
7  |    No City Detected    

Explanation

ID 1 is totally fine, it has 1 City and has 1 person therefore it is not in the list
ID 2 it is because it has 2 people 
ID 3 because it has 1 city. we can ignore 'abcd' as it is not in the tags category
ID 4 is is because it has 2 cities
ID 5 because America is not city and therefore no city in that row
ID 6 is fine. We can ignore 'defg' as it is not in the list
ID 7 (the same reason as ID 5)

It looks easy for me at the first glance, however, I always find bugs in my query. Do you have any suggestion/help me with the logic or even query example?

I use standardBigQuery.

Upvotes: 1

Views: 237

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172974

WITH id_tags AS
(
  SELECT 1 AS ID, ['Michael','New York'] AS tags UNION ALL
  SELECT 2 AS ID, ['Michael','Jon', 'Texas'] AS tags UNION ALL
  SELECT 3 AS ID, ['abcd','Washington'] AS tags UNION ALL
  SELECT 4 AS ID, ['Washington','New York','Michael'] AS tags UNION ALL
  SELECT 5 AS ID, ['America','Michael'] AS tags UNION ALL
  SELECT 6 AS ID, ['Washington','Michael', 'defg'] AS tags UNION ALL
  SELECT 7 AS ID, ['America','Burqq','defg'] AS tags
),
tagsCategory AS(
  SELECT 'Michael' AS tags, 'Person' AS category UNION ALL
  SELECT 'Jon' AS tags, 'Person' AS category UNION ALL
  SELECT 'Burqq' AS tags, 'Person' AS category  UNION ALL
  SELECT 'New York' AS tags, 'City' AS category UNION ALL
  SELECT 'Washington' AS tags, 'City' AS category UNION ALL
  SELECT 'Texas' AS tags, 'City' AS category
)
SELECT id, 
  COUNTIF(category = 'City') AS cities, 
  COUNTIF(category = 'Person') AS names 
FROM id_tags, UNNEST(tags) tag
JOIN tagsCategory ON tag = tagsCategory.tags
GROUP BY id
HAVING NOT cities = 1 OR NOT names = 1

Upvotes: 3

Related Questions