Piyush Upadhyay
Piyush Upadhyay

Reputation: 41

Error: String contains an untranslatable character - TERADATA (for REGEXP_REPLACE operation)

I need to clean one character column and for that I am using REGEXP_REPLACE function in Teradata 14.

The same piece of code worked for some other data source (having the same LATIN encoding).

The data definition using show table has given me below format of the data:

CREATE SET TABLE pp_oap_cj_t.dc_loss_fdr_kn ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
( PARENT_ID DECIMAL(38,0),
 FS_MRCH_NM VARCHAR(25) CHARACTER SET LATIN NOT CASESPECIFIC
) PRIMARY INDEX ( PARENT_ID );

The query I am performing is as below:

CREATE TABLE pp_oap_pb_piyush_t.CHECKMERCHANT1 AS (
 SELECT 
 FS_MRCH_NM,
REGEXP_REPLACE(trim(Upper(trim(REGEXP_REPLACE( (FS_MRCH_NM ) , '[^a-z]',' ',1,0,'i'))) ), '[[:space:]]+',' ',1,0,'i')   as cleaned_merchant
 FROM pp_oap_pb_piyush_t.CHECKMERCHANT)
  WITH DATA PRIMARY INDEX (FS_MRCH_NM);

Error

CREATE TABLE Failed. 6706:  The string contains an untranslatable character.

I need a quick turnaround this bottleneck.

Help is really appreciated ! Thanks !!!!

Upvotes: 4

Views: 32610

Answers (4)

NathanH
NathanH

Reputation: 101

UPDATE CFSYSUAT.Metadata
SET MetadataTxt = 
CASE WHEN TRANSLATE_CHK(MetadataTxt USING UNICODE_TO_LATIN) > 0
THEN SUBSTRING(MetadataTxt,1,TRANSLATE_CHK(MetadataTxt USING UNICODE_TO_LATIN) - 1) || 
     SUBSTRING(MetadataTxt,TRANSLATE_CHK(MetadataTxt USING UNICODE_TO_LATIN) + 1) 
ELSE MetadataTxt END;

Had some luck with TRANSLATE_CHK. It returns the position of the offending character. Used it with SUBSTRING to remove the offending character. If the text contains multiple bad characters you have to run the update multiple times, each pass will correct another bad character.

HTH, Nathan

Upvotes: 1

Younes
Younes

Reputation: 1

TRANSLATE(OREPLACE(TRANSLATE(item_name USING LATIN_TO_UNICODE WITH ERROR),'|','') USING UNICODE_TO_LATIN WITH ERROR) AS LBL

Upvotes: -1

Rishabh Sharma
Rishabh Sharma

Reputation: 33

A non-Unicode-compatible version of oreplace is installed in our syslib, and a Unicode-compatible version is in our td_sysfnlib. When the database is not specified, syslib is used before td_sysfnlib. So forcing TD to use the td_sysfnlib version of oreplace solved the problem.

Here's the code used:

SELECT td_sysfnlib.OREPLACE(item_name,'|','') FROM databaseB.sales;

I hope that helps anyone else who's running into the same issue!

Upvotes: 0

Kiran
Kiran

Reputation: 3315

REGEXP_REPLACE under the hood converts character set Latin to Unicode. You have defined your variable as character set Latin. You see the error when data has something which cannot be converted from Latin to Unicode. Best thing is to fix your DDL to have character set as Unicode instead of Latin. something like TRANSLATE(FS_MRCH_NM USING LATIN_TO_UNICODE WITH ERROR) in your code instead of FS_MRCH_NM should work. Problem with this it result in null values when you have untranslatable characters.

Upvotes: 6

Related Questions