Reputation: 2886
Is there a ".net way" to convert characters like: úüãáâàçéêíõóô áéíñóúü¿¡ To a similar non-accented letter.
eg.ú to u
My question is like a pre-text to: Handling SEO Friendly URL with Non-English Characters
If not, I guess I can always right a find & replace function.
Upvotes: 3
Views: 783
Reputation: 18863
here is another example from previous / similar question
public static IEnumerable<char> RemoveDiacriticsEnum(string src, bool compatNorm, Func<char, char> customFolding)
{
foreach(char c in src.Normalize(compatNorm ? NormalizationForm.FormKD : NormalizationForm.FormD))
switch(CharUnicodeInfo.GetUnicodeCategory(c))
{
case UnicodeCategory.NonSpacingMark:
case UnicodeCategory.SpacingCombiningMark:
case UnicodeCategory.EnclosingMark:
//do nothing
break;
default:
yield return customFolding(c);
break;
}
}
public static IEnumerable<char> RemoveDiacriticsEnum(string src, bool compatNorm)
{
return RemoveDiacritics(src, compatNorm, c => c);
}
public static string RemoveDiacritics(string src, bool compatNorm, Func<char, char> customFolding)
{
StringBuilder sb = new StringBuilder();
foreach(char c in RemoveDiacriticsEnum(src, compatNorm, customFolding))
sb.Append(c);
return sb.ToString();
}
public static string RemoveDiacritics(string src, bool compatNorm)
{
return RemoveDiacritics(src, compatNorm, c => c);
}
Here we've a default for the problem cases mentioned above, which just ignores them. We've also split building a string from generating the enumeration of characters so we need not be wasteful in cases where there's no need for string manipulation on the result (say we were going to write the chars to output next, or do some further char-by-char manipulation).
An example case for something where we wanted to also convert l and L to l and L, but had no other specialised concerns could use:
private static char NormaliseLWithStroke(char c)
{
switch(c)
{
case 'l':
return 'l';
case 'L':
return 'L';
default:
return c;
}
}
Upvotes: 0
Reputation: 55457
See this post from Michael Kaplan
static string RemoveDiacritics(string stIn) {
string stFormD = stIn.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
for(int ich = 0; ich < stFormD.Length; ich++) {
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
if(uc != UnicodeCategory.NonSpacingMark) {
sb.Append(stFormD[ich]);
}
}
return(sb.ToString().Normalize(NormalizationForm.FormC));
}
Upvotes: 3