Michael Ames
Michael Ames

Reputation: 2617

BigQuery SPLIT() ignores empty values

It appears that SPLIT() treats empty values as though they don't exist at all, which yields unexpected results.

For example:

SELECT
  NTH(3, SPLIT(values, ","))
FROM
  (SELECT "a,b,,d,e" as values)

returns "d", when I would expect it to return NULL. You can see how this would be problematic for several rows of comma-delimited text. One would expect the following query to return NULL and "c", but it doesn't:

SELECT
  NTH(3, SPLIT(values, ","))
FROM
  (SELECT "a,b,,d,e" as values),
  (SELECT "a,,c,d,e" as values)

Rather, it returns "d" and "d".

Is this behavior by design, can it be altered, or is there a better way to do what I'm doing?

Upvotes: 5

Views: 1625

Answers (2)

Felipe Hoffa
Felipe Hoffa

Reputation: 59175

As Mosha explains, obtaining NULLs from a SPLIT() operation is not possible. As an alternative, is getting empty spaces an option?

SELECT
  NTH(7, SPLIT(values, ","))
FROM
(SELECT REGEXP_REPLACE(values, ',,', ', ,') values FROM
  (SELECT "a,b,,d,e,,g" as values),
  (SELECT "a,,c,d,e,f,g" as values),
  (SELECT "a,,c,d,e,f,," as values),
  (SELECT "a,,c,d,e,f," as values),
)

Row f0_  
1   g    
2   g    
3        
4   null    

Upvotes: 2

Mosha Pasumansky
Mosha Pasumansky

Reputation: 14004

This is By Design behavior, and it is not specific to SPLIT function, but to REPEATED fields in general. BigQuery REPEATED fields cannot store NULLs (same behavior as in protocol buffers), therefore nothing that SPLIT does can make NULLs appear inside REPEATED fields.

Upvotes: 1

Related Questions