Why would REGEXP_REPLACE throw an untranslatable character ? Is this a bug?

Question

I have a huge table testTable. I am trying to clean txt column by leaving only letters and spaces.

Here is what I do

SELECT id_cd, REGEXP_REPLACE(txt, '[^a-zA-Z ]', '', 1, 0, 'i') new_txt 
FROM testTable

The above query throws String contains an untranslatable character. I don't know which row it fails at because the table is huge.

When I limit the number of rows to some small number, it works, so the query syntax is fine.

 SELECT TOP 10 id_cd, REGEXP_REPLACE(txt, '[^a-zA-Z ]', '', 1, 0, 'i') new_txt 
 FROM testTable

Is this a bug ? How can I spot the row it fails ? Why would it fail in the first place ?

I also tried the following :

SELECT id_cd, REGEXP_REPLACE(TRANSLATE(txt using unicode_to_latin), '[^a-zA-Z ]', '', 1, 0, 'i') new_txt 
 FROM testTable

But also got the same error .

anwaar_hell · Accepted Answer

We have same issue at the start of the year . You can find these characters using translate_chk function which returns 0 if success else non zero character. We handled like below

case when  translate_chk(d.column using  latin_to_unicode)  <> '0'
  then 0
  else Instr(d.column  , 'E3S')
  end column

But in that case we were not cared about the lost data , we loaded 0 for that. Hope it helps..

Why would REGEXP_REPLACE throw an untranslatable character ? Is this a bug?

Answers (1)

Related Questions