Emily
Emily

Reputation: 531

Why would REGEXP_REPLACE throw an untranslatable character ? Is this a bug?

I have a huge table testTable. I am trying to clean txt column by leaving only letters and spaces.

Here is what I do

SELECT id_cd, REGEXP_REPLACE(txt, '[^a-zA-Z ]', '', 1, 0, 'i') new_txt 
FROM testTable 

The above query throws String contains an untranslatable character. I don't know which row it fails at because the table is huge.

When I limit the number of rows to some small number, it works, so the query syntax is fine.

 SELECT TOP 10 id_cd, REGEXP_REPLACE(txt, '[^a-zA-Z ]', '', 1, 0, 'i') new_txt 
 FROM testTable 

Is this a bug ? How can I spot the row it fails ? Why would it fail in the first place ?


I also tried the following :

SELECT id_cd, REGEXP_REPLACE(TRANSLATE(txt using unicode_to_latin), '[^a-zA-Z ]', '', 1, 0, 'i') new_txt 
 FROM testTable 

But also got the same error .

Upvotes: 2

Views: 2335

Answers (1)

anwaar_hell
anwaar_hell

Reputation: 776

We have same issue at the start of the year . You can find these characters using translate_chk function which returns 0 if success else non zero character. We handled like below

case when  translate_chk(d.column using  latin_to_unicode)  <> '0'
  then 0
  else Instr(d.column  , 'E3S')
  end column

But in that case we were not cared about the lost data , we loaded 0 for that. Hope it helps..

Upvotes: 2

Related Questions