GibboK
GibboK

Reputation: 73918

Manipulating a String: Removing special characters - Change all accented letters to non accented

I'm using asp.net 4 and c#.

I have a string that can contains:

Example string:

#Hi this          is  rèally/ special strìng!!!

I would like to:

a) Remove all Special Characters, like:

Hi this          is  rèally special strìng

b) Convert all Accented letters to NON Accented letters, like:

Hi this          is  really special string

c) Remove all Empty spaces and replace theme with a dash (-), like:

Hi-this-is-really-special-string

My aim is to creating a string suitable for URL path for better SEO.

Any idea how to do it with Regular Expression or another techniques?

Thanks for your help on this!

Upvotes: 4

Views: 9770

Answers (3)

Jens
Jens

Reputation: 25563

Similar to mathieu's answer, but more custom made for you requirements. This solution first strips special characters and diacritics from the input string, and then replaces whitespace with dashes:

string s = "#Hi this          is  rèally/ special strìng!!!";
string normalized = s.Normalize(NormalizationForm.FormD);


StringBuilder resultBuilder = new StringBuilder();
foreach (var character in normalized)
{
    UnicodeCategory category = CharUnicodeInfo.GetUnicodeCategory(character);
    if (category == UnicodeCategory.LowercaseLetter
        || category == UnicodeCategory.UppercaseLetter
        || category == UnicodeCategory.SpaceSeparator)
        resultBuilder.Append(character);
}
string result = Regex.Replace(resultBuilder.ToString(), @"\s+", "-");

See it in action at ideone.com.

Upvotes: 9

mathieu
mathieu

Reputation: 31192

You should have a look a this answer : Ignoring accented letters in string comparison

Code here :

static string RemoveDiacritics(string sIn)
{
  string sFormD = sIn.Normalize(NormalizationForm.FormD);
  StringBuilder sb = new StringBuilder();

  foreach (char ch in sFormD)
  {
    UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch);
    if (uc != UnicodeCategory.NonSpacingMark)
    {
      sb.Append(ch);
    }
  }

  return (sb.ToString().Normalize(NormalizationForm.FormC));
}

Upvotes: 3

tafa
tafa

Reputation: 7326

I am not an expert when it comes to RegularExpressions but I doubt it would be useful for this sort of computation.

To me, a simple iteration over the characters of the input is enough:

List<char> specialChars = 
    new List<char>() { '!', '"', '£', '$', '%', '&', '/', '(', ')', '/', '#' };

string specialString = "#Hi this          is  rèally/ special strìng!!!";

System.Text.StringBuilder builder =
    new System.Text.StringBuilder(specialString.Length);

bool encounteredWhiteSpace = false;


foreach (char ch in specialString)
{
    char val = ch;

    if (specialChars.Contains(val))
        continue;

    switch (val)
    {
        case 'è':
            val = 'e'; break;
        case 'à':
            val = 'a'; break;
        case 'ò':
            val = 'o'; break;
        case 'ù':
        case 'ü':
            val = 'u'; break;
        case 'ı':
        case 'ì':
            val = 'i'; break;
    }

    if (val == ' ' || val == '\t')
    {
        encounteredWhiteSpace = true;
        continue;
    }

    if (encounteredWhiteSpace)
    {
        builder.Append('-');
        encounteredWhiteSpace = false;
    }

    builder.Append(val);
}

string result = builder.ToString();

Upvotes: 0

Related Questions