Reputation: 531
I have a huge table testTable
. I am trying to clean txt
column by leaving only letters and spaces.
Here is what I do
SELECT id_cd, REGEXP_REPLACE(txt, '[^a-zA-Z ]', '', 1, 0, 'i') new_txt
FROM testTable
The above query throws String contains an untranslatable character
. I don't know which row it fails at because the table is huge.
When I limit the number of rows to some small number, it works, so the query syntax is fine.
SELECT TOP 10 id_cd, REGEXP_REPLACE(txt, '[^a-zA-Z ]', '', 1, 0, 'i') new_txt
FROM testTable
Is this a bug ? How can I spot the row it fails ? Why would it fail in the first place ?
I also tried the following :
SELECT id_cd, REGEXP_REPLACE(TRANSLATE(txt using unicode_to_latin), '[^a-zA-Z ]', '', 1, 0, 'i') new_txt
FROM testTable
But also got the same error .
Upvotes: 2
Views: 2335
Reputation: 776
We have same issue at the start of the year . You can find these characters using translate_chk function which returns 0 if success else non zero character. We handled like below
case when translate_chk(d.column using latin_to_unicode) <> '0'
then 0
else Instr(d.column , 'E3S')
end column
But in that case we were not cared about the lost data , we loaded 0 for that. Hope it helps..
Upvotes: 2