BigQuery string comparison and adjustment with replace/trim

Question

I tried to accommodate in BigQuery for inputs with bad spacing.
However, I keep getting weird results, basically saying that abc <> abc.

Can anyone explain how this is possible? I tried to adjust this by using replace and rtrim but none of them served the purpose.

Thank you

SELECT o = t,o, t, length(o) as lon_o, length(t) as len_t
from
(select replace('abc                                                                ',' ','') o,
'abc' as t)

Mikhail Berlyant · Accepted Answer

I was able easily reproduce your case by having (for example) few tabs inside that string instead of spaces -

visually they look exactly alike - but first row has 64 spaces in it whereas second row has 5 tabs and 44 spaces. In Web UI each tab occupy 2 char-places so that is why total 5 tabs + 44 spaces look like 64 spaces (exactly as in first row)

To address this - you can use REGEXP_REPLACE instead of REPLACE
Using REGEXP_REPLACE allows you to 'remove' all white spaces, as in example below (the only difference here is in second row - using REGEXP_REPLACE instead of REPLACE and use of r'\s' instead ' '

And, finally - if to get back to first example - the good way of seeing what actually makes those 8 chars vs. 3 chars in result's lon_o is to switch from Table viewtoJSON` view as below

BigQuery string comparison and adjustment with replace/trim

Answers (2)

Related Questions