masospaghetti
masospaghetti

Reputation: 348

Removing replacement character � from column

Based on my research so far this character indicates bad encoding between the database and front end. Unfortunately, I don't have any control over either of those. I'm using Teradata Studio.

How can I filter this character out? I'm trying to perform a REGEX_SUBSTR function on a column that occasionally contains , which throws the error "The string contains an untranslatable character".

Here is my SQL. AIRCFT_POSITN_ID is the column that contains the replacement character.

 SELECT DISTINCT AIRCFT_POSITN_ID, 
 REGEXP_SUBSTR(AIRCFT_POSITN_ID, '[0-9]+') AS AUTOROW
 FROM PROD_MAE_MNTNC_VW.FMR_DISCRPNCY_DFRL 
 WHERE DFRL_CREATE_TMS > CURRENT_DATE -25

Upvotes: 3

Views: 9016

Answers (2)

masospaghetti
masospaghetti

Reputation: 348

In addition to Dudu's excellent answer above, I wanted to add the following now that I've encountered the issue again and had more time to experiment. The following SELECT command produced an untranslatable character:

SELECT IDENTIFY FROM PROD_MAE_MNTNC_VW.SCHD_MNTNC;

IDENTIFY
24FEB1747659193DC330A163DCL�ORD

Trying to perform a REGEXP_REPLACE or OREPLACE directly on this character produces an error:

Failed [6706 : HY000] The string contains an untranslatable character. 

I changed the CHARSET property in my Teradata connection from UTF8 to ASCII and I could now see the offending character, looks like a tab

IDENTIFY

enter image description here

Using the TRANSLATE_CHK command using this specific conversion succeeds and identifies the position of the offending character (Note that this does not work using the UTF8 charset):

TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE) AS BADCHAR

BADCHAR
28

Now this character can be dealt with using some CASE statements to remove the bad character and retain the remainder of the string:

CASE WHEN TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE) = 0 THEN IDENTIFY
ELSE SUBSTR(IDENTIFY, 1, TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE)-1)
END AS IDENTIFY

Hopes this helps someone out.

Upvotes: 2

David דודו Markovitz
David דודו Markovitz

Reputation: 44941

Your diagnostic is correct, so first of all, you might want to check the Session Character Set (it is part of the connection definition). If it is ASCII change it to UTF8 and you will be able to see the original characters instead of the substitute character.


And in case the character is indeed part of the data and not just an indication for encoding translations issues:

The substitute character AKA SUB (DEC: 26 HEX: 1A) is quite unique in Teradata.

you cannot use it directly -

select  '�';

-- [6706] The string contains an untranslatable character.

select  '1A'XC;

-- [6706] The string contains an untranslatable character.

If you are using version 14.0 or above you can generate it with the CHR function:

select  chr(26);

If you're below version 14.0 you can generate it like this:

select  translate (_unicode '05D0'XC using unicode_to_latin with error);

Once you have generated the character you can now use it with REPLACE or OTRANSLATE

create multiset table t (i int,txt varchar(100) character set latin) unique primary index (i);

insert into t (i,txt) values (1,translate ('Hello שלום world עולם' using unicode_to_latin with error));

select * from t;

-- Hello ���� world ����

select otranslate (txt,chr(26),'') from t;

-- Hello  world 

select otranslate (txt,translate (_unicode '05D0'XC using unicode_to_latin with error),'') from t;

-- Hello  world 

BTW, there are 2 versions for OTRANSLATE and OREPLACE:

  • The functions under syslib works with LATIN.
  • the functions under TD_SYSFNLIB works with UNICODE.

Upvotes: 5

Related Questions