Way to convert acented character to a url friendly one?

Is there a ".net way" to convert characters like: úüãáâàçéêíõóô áéíñóúü¿¡ To a similar non-accented letter.

eg.ú to u

My question is like a pre-text to: Handling SEO Friendly URL with Non-English Characters

If not, I guess I can always right a find & replace function.

Upvotes: 3

Answers (2)

MethodMan

Reputation: 18863

here is another example from previous / similar question

public static IEnumerable<char> RemoveDiacriticsEnum(string src, bool compatNorm, Func<char, char> customFolding)
{
    foreach(char c in src.Normalize(compatNorm ? NormalizationForm.FormKD : NormalizationForm.FormD))
    switch(CharUnicodeInfo.GetUnicodeCategory(c))
    {
      case UnicodeCategory.NonSpacingMark:
      case UnicodeCategory.SpacingCombiningMark:
      case UnicodeCategory.EnclosingMark:
        //do nothing
        break;
      default:
        yield return customFolding(c);
        break;
    }
}
public static IEnumerable<char> RemoveDiacriticsEnum(string src, bool compatNorm)
{
  return RemoveDiacritics(src, compatNorm, c => c);
}
public static string RemoveDiacritics(string src, bool compatNorm, Func<char, char> customFolding)
{
  StringBuilder sb = new StringBuilder();
  foreach(char c in RemoveDiacriticsEnum(src, compatNorm, customFolding))
    sb.Append(c);
  return sb.ToString();
}
public static string RemoveDiacritics(string src, bool compatNorm)
{
  return RemoveDiacritics(src, compatNorm, c => c);
}

Here we've a default for the problem cases mentioned above, which just ignores them. We've also split building a string from generating the enumeration of characters so we need not be wasteful in cases where there's no need for string manipulation on the result (say we were going to write the chars to output next, or do some further char-by-char manipulation).

An example case for something where we wanted to also convert l and L to l and L, but had no other specialised concerns could use:

private static char NormaliseLWithStroke(char c)
{
  switch(c)
  {
     case 'l':
       return 'l';
     case 'L':
       return 'L';
     default:
       return c;
  }
}

Upvotes: 0

Chris Haas

Reputation: 55457

See this post from Michael Kaplan

    static string RemoveDiacritics(string stIn) {
      string stFormD = stIn.Normalize(NormalizationForm.FormD);
      StringBuilder sb = new StringBuilder();

      for(int ich = 0; ich < stFormD.Length; ich++) {
        UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
        if(uc != UnicodeCategory.NonSpacingMark) {
          sb.Append(stFormD[ich]);
        }
      }

      return(sb.ToString().Normalize(NormalizationForm.FormC));
    }

Upvotes: 3

Way to convert acented character to a url friendly one?

Answers (2)

Related Questions