Reputation: 2617
It appears that SPLIT() treats empty values as though they don't exist at all, which yields unexpected results.
For example:
SELECT
NTH(3, SPLIT(values, ","))
FROM
(SELECT "a,b,,d,e" as values)
returns "d", when I would expect it to return NULL. You can see how this would be problematic for several rows of comma-delimited text. One would expect the following query to return NULL and "c", but it doesn't:
SELECT
NTH(3, SPLIT(values, ","))
FROM
(SELECT "a,b,,d,e" as values),
(SELECT "a,,c,d,e" as values)
Rather, it returns "d" and "d".
Is this behavior by design, can it be altered, or is there a better way to do what I'm doing?
Upvotes: 5
Views: 1625
Reputation: 59175
As Mosha explains, obtaining NULLs from a SPLIT() operation is not possible. As an alternative, is getting empty spaces an option?
SELECT
NTH(7, SPLIT(values, ","))
FROM
(SELECT REGEXP_REPLACE(values, ',,', ', ,') values FROM
(SELECT "a,b,,d,e,,g" as values),
(SELECT "a,,c,d,e,f,g" as values),
(SELECT "a,,c,d,e,f,," as values),
(SELECT "a,,c,d,e,f," as values),
)
Row f0_
1 g
2 g
3
4 null
Upvotes: 2
Reputation: 14004
This is By Design behavior, and it is not specific to SPLIT function, but to REPEATED fields in general. BigQuery REPEATED fields cannot store NULLs (same behavior as in protocol buffers), therefore nothing that SPLIT does can make NULLs appear inside REPEATED fields.
Upvotes: 1