Removing replacement character � from column

Question

Based on my research so far this character indicates bad encoding between the database and front end. Unfortunately, I don't have any control over either of those. I'm using Teradata Studio.

How can I filter this character out? I'm trying to perform a REGEX_SUBSTR function on a column that occasionally contains �, which throws the error "The string contains an untranslatable character".

Here is my SQL. AIRCFT_POSITN_ID is the column that contains the replacement character.

 SELECT DISTINCT AIRCFT_POSITN_ID, 
 REGEXP_SUBSTR(AIRCFT_POSITN_ID, '[0-9]+') AS AUTOROW
 FROM PROD_MAE_MNTNC_VW.FMR_DISCRPNCY_DFRL 
 WHERE DFRL_CREATE_TMS > CURRENT_DATE -25

David דודו Markovitz · Accepted Answer

Your diagnostic is correct, so first of all, you might want to check the Session Character Set (it is part of the connection definition). If it is ASCII change it to UTF8 and you will be able to see the original characters instead of the substitute character.

And in case the character is indeed part of the data and not just an indication for encoding translations issues:

The substitute character AKA SUB (DEC: 26 HEX: 1A) is quite unique in Teradata.

you cannot use it directly -

select  '�';

-- [6706] The string contains an untranslatable character.

select  '1A'XC;

-- [6706] The string contains an untranslatable character.

If you are using version 14.0 or above you can generate it with the CHR function:

select  chr(26);

If you're below version 14.0 you can generate it like this:

select  translate (_unicode '05D0'XC using unicode_to_latin with error);

Once you have generated the character you can now use it with REPLACE or OTRANSLATE

create multiset table t (i int,txt varchar(100) character set latin) unique primary index (i);

insert into t (i,txt) values (1,translate ('Hello שלום world עולם' using unicode_to_latin with error));

select * from t;

-- Hello ���� world ����

select otranslate (txt,chr(26),'') from t;

-- Hello  world 

select otranslate (txt,translate (_unicode '05D0'XC using unicode_to_latin with error),'') from t;

-- Hello  world

BTW, there are 2 versions for OTRANSLATE and OREPLACE:

The functions under syslib works with LATIN.
the functions under TD_SYSFNLIB works with UNICODE.

Removing replacement character � from column

Answers (2)

Related Questions