Reputation: 73918
I'm using asp.net 4 and c#.
I have a string that can contains:
Example string:
#Hi this is rèally/ special strìng!!!
I would like to:
a) Remove all Special Characters, like:
Hi this is rèally special strìng
b) Convert all Accented letters to NON Accented letters, like:
Hi this is really special string
c) Remove all Empty spaces and replace theme with a dash (-), like:
Hi-this-is-really-special-string
My aim is to creating a string suitable for URL path for better SEO.
Any idea how to do it with Regular Expression or another techniques?
Thanks for your help on this!
Upvotes: 4
Views: 9770
Reputation: 25563
Similar to mathieu's answer, but more custom made for you requirements. This solution first strips special characters and diacritics from the input string, and then replaces whitespace with dashes:
string s = "#Hi this is rèally/ special strìng!!!";
string normalized = s.Normalize(NormalizationForm.FormD);
StringBuilder resultBuilder = new StringBuilder();
foreach (var character in normalized)
{
UnicodeCategory category = CharUnicodeInfo.GetUnicodeCategory(character);
if (category == UnicodeCategory.LowercaseLetter
|| category == UnicodeCategory.UppercaseLetter
|| category == UnicodeCategory.SpaceSeparator)
resultBuilder.Append(character);
}
string result = Regex.Replace(resultBuilder.ToString(), @"\s+", "-");
See it in action at ideone.com.
Upvotes: 9
Reputation: 31192
You should have a look a this answer : Ignoring accented letters in string comparison
Code here :
static string RemoveDiacritics(string sIn)
{
string sFormD = sIn.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
foreach (char ch in sFormD)
{
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch);
if (uc != UnicodeCategory.NonSpacingMark)
{
sb.Append(ch);
}
}
return (sb.ToString().Normalize(NormalizationForm.FormC));
}
Upvotes: 3
Reputation: 7326
I am not an expert when it comes to RegularExpressions but I doubt it would be useful for this sort of computation.
To me, a simple iteration over the characters of the input is enough:
List<char> specialChars =
new List<char>() { '!', '"', '£', '$', '%', '&', '/', '(', ')', '/', '#' };
string specialString = "#Hi this is rèally/ special strìng!!!";
System.Text.StringBuilder builder =
new System.Text.StringBuilder(specialString.Length);
bool encounteredWhiteSpace = false;
foreach (char ch in specialString)
{
char val = ch;
if (specialChars.Contains(val))
continue;
switch (val)
{
case 'è':
val = 'e'; break;
case 'à':
val = 'a'; break;
case 'ò':
val = 'o'; break;
case 'ù':
case 'ü':
val = 'u'; break;
case 'ı':
case 'ì':
val = 'i'; break;
}
if (val == ' ' || val == '\t')
{
encounteredWhiteSpace = true;
continue;
}
if (encounteredWhiteSpace)
{
builder.Append('-');
encounteredWhiteSpace = false;
}
builder.Append(val);
}
string result = builder.ToString();
Upvotes: 0