Reputation: 9441
I have a category column with a string containing subcategory fields in variable positions separated by a "|". The location of each subcategory is dependent on the number of items in the string. For example:
category subcat1 subcat2 subcat3
a|b|c b c a
x|y|a|b b null a
So, to solve for one category, I have:
SELECT
a.category AS category,
case
WHEN COUNT(SPLIT(a.category, "|")) = 4 then nth(4, SPLIT(a.category, "|"))
WHEN COUNT(SPLIT(a.category, "|")) = 3 then nth(2, SPLIT(a.category, "|"))
WHEN COUNT(SPLIT(a.category, "|")) = 2 then nth(2, SPLIT(a.category, "|"))
else null
end as subcat1,
--nth(2, SPLIT(a.category, "|")) as x --uncomment for success. see below
FROM
[interim_groups.articles_unique] as a
Running this fails with:
SELECT clause has mix of aggregations 'subcat1' and fields 'category' without GROUP BY clause
Now I don't want a group by
clause and it doesn't make sense to have one, but if I include it it starts complaining scoped aggregations, which seems to be going in the wrong direction.
The same occurs if I use an if
statement instead of a case
statement.
Now here is the strange bit. If I have the commented line (or alternativelylast(SPLIT(a.category, "|")) as x
) in my query, the query passes flawlessly.
Is this a bug? My query looks right, and having an extra column in my query somehow making it pass is strange.
Is there a better way to fix this than just leaving in an unnecessary column to stabilize the query?
Upvotes: 3
Views: 779
Reputation: 59175
The query is missing the 'WITHIN RECORD' keyword.
SELECT
a.category AS category,
case
WHEN COUNT(SPLIT(a.category, "|")) = 4 then nth(4, SPLIT(a.category, "|"))
WHEN COUNT(SPLIT(a.category, "|")) = 3 then nth(2, SPLIT(a.category, "|"))
WHEN COUNT(SPLIT(a.category, "|")) = 2 then nth(2, SPLIT(a.category, "|"))
else null
end WITHIN RECORD as subcat1 ,
FROM (SELECT category FROM
(SELECT 'a|b|c' category), (SELECT 'a|b' category), (SELECT 'a|b|c|d' category)) a
See the documentation for WITHIN:
The WITHIN keyword specifically works with aggregate functions to aggregate across children and repeated fields within records and nested fields. When you specify the WITHIN keyword, you need to specify the scope over which you want to aggregate - WITHIN RECORD: Aggregates data in the repeated values within the record.
https://cloud.google.com/bigquery/docs/data#within
Upvotes: 5
Reputation: 172993
I think using String functions you can relatively easily extract last component in pipe separated string
Meantime, below version with use of Regular expression functions - it is a little more powerfull for potentially more complex scenarious
SELECT
a.category AS category,
CASE
WHEN LENGTH(REGEXP_REPLACE(a.category, r"[^|]", "")) = 4
THEN REGEXP_EXTRACT(a.category, r'\|\w+\|\w+\|\w+\|(\w+)')
WHEN LENGTH(REGEXP_REPLACE(a.category, r"[^|]", "")) = 3
THEN REGEXP_EXTRACT(a.category, r'\|\w+\|\w+\|(\w+)')
WHEN LENGTH(REGEXP_REPLACE(a.category, r"[^|]", "")) = 2
THEN REGEXP_EXTRACT(a.category, r'\|\w+\|(\w+)')
ELSE NULL
END AS subcat1
FROM your_table a
No grouping, no unnecessary column as asked in question!
Upvotes: 2