Reputation: 131
I am trying to remove all accents from a string column in Snowflake. In MSSQL I can use the collate function to do this. For example I have this string in a column: 'JESÚSSMITH12345'. I want to remove the accent from above the U:
In MSSQL I can do the following:
select ('JESÚSSMITH12345' Collate SQL_Latin1_General_CP1253_CI_AI) as identifier
returns: JESUSSMITH12345
However, I can't find a way to do this in Snowflake, other than using the TRANSLATE function. Using the TRANSLATE function isn't a good solution because, I would have to define every possible letter with an accent to do the translation. For example, this works in Snowflake:
select translate('JESÚSSMITH12345', 'Ú', 'U');
returns: JESUSSMITH12345
However, if a column contains any value other than U with an accent, I have to add that to my translation. It's not ideal.
Using the collate function in Snowflake, even with "ai" (a.k.a accent-insensitive), this still returns the string with the accented U.
For example in Snowflake:
select collate('JESÚSSMITH12345', 'en-ai');
returns: JESÚSSMITH12345
Upvotes: 2
Views: 4754
Reputation: 59225
Update: Better JS line using {Diacritic} by Greg Pavlik
return str.normalize("NFD").replace(/\p{Diacritic}/gu, "");
You can solve this with a JS UDF in Snowflake:
CREATE OR REPLACE FUNCTION normalize_js(S string)
RETURNS string
LANGUAGE JAVASCRIPT
AS 'return S.normalize("NFD").replace(/[\u0300-\u036f]/g, "");'
;
select normalize_js('áéÉña');
-- 'aeEna'
I got that JS code from Remove accents/diacritics in a string in JavaScript.
Upvotes: 3
Reputation: 9788
the problem that you are facing is that U and Ú are completely different characters so from a coding/SQL perspective the statement "remove the accent from above the U" is meaningless.
The only option you have is to use, as you have done, the TRANSLATE function (or build equivalent functionality in a Stored Procedure). You will need to take every character that has an ascii value >= 128 and provide the character you want to replace it with. You could hard code this or, for more flexibility, put the accented chars, and the char you want to translate them to, into a table and use this in your SP
Upvotes: 1