Reputation: 501
I have a varchar
column and i want to replace all diacritics with normal letters
For example:
São Paulo
Out: Sao Paulo
eéíãç
Out: eeiac
Upvotes: 2
Views: 2044
Reputation: 60482
A diacritical character is a composite character, i.e. can be a base char plus a diacritic, e.g.
'a' + '´' = 'á'
0061 + 00B4 = 00E1
Both 006100B4
and 00E1
result in the same character, Unicode allows to switch back and forth using normalization functions, which are supported by Teradata:
translate(string using UNICODE_TO_UNICODE_NFD)
decomposes a composite character into separate characters. Those Combining Diacritical Marks are in a Unicode block ranging from U+0300 to U+036F.
Now decompose the input and apply a Regular Expression to remove characters from this range:
select
'Ťĥïŝ ẅåš á şťŕĭńġ ŵïţĥ ḿäṇȳ ḓìāčṝĩţïĉș' as str,
REGEXP_REPLACE(translate(str using UNICODE_TO_UNICODE_NFD) -- decomposed input
,'[\x{0300}-\x{036F}]' -- diacritics
,'')
returns
Ťĥïŝ ẅåš á şťŕĭńġ ŵïţĥ ḿäṇȳ ḓìāčṝĩţïĉș
This was a string with many diacritics
If there are other decomposable characters you might need to compose them again to save space using another translate(... using UNICODE_TO_UNICODE_NFC)
If you input string has a LATIN charset it might be easier to find the limited list of diacritical characters and apply translate:
oTranslate(str, 'àáâãäåÀÁÂÃÄÅÇ...', 'aaaaaaAAAAAAC...')
Upvotes: 3