Reputation: 4443
How to edit this this regex Regex.Replace(encodedUrl, @"[^a-z0-9]", "-");
to not delete special characters like ę,ą,ó,ł
, etc?
Here is my method. I use it to generate nice URL's, without these chars .,#$%@:;
in URL.
public static string ToSeoUrl(this string url)
{
// make the url lowercase
string encodedUrl = (url ?? "").ToLower();
// replace & with and
encodedUrl = Regex.Replace(encodedUrl, @"\&+", "and");
// remove characters
encodedUrl = encodedUrl.Replace("'", "");
// remove invalid characters
encodedUrl = Regex.Replace(encodedUrl, @"[^a-z0-9]", "-");
// remove duplicates
encodedUrl = Regex.Replace(encodedUrl, @"-+", "-");
// trim leading & trailing characters
encodedUrl = encodedUrl.Trim('-');
return encodedUrl;
}
Regards
Upvotes: 1
Views: 980
Reputation: 120450
Although this doesn't answer your question directly, the following method for stripping accents, diacritics etc might be handy.
public static String RemoveAccentsAndDiacritics(this String s)
{
return string.Join(string.Empty,
s
.Normalize(NormalizationForm.FormD)
.Where(c =>
CharUnicodeInfo.GetUnicodeCategory(c) !=
UnicodeCategory.NonSpacingMark));
}
Upvotes: 1
Reputation: 499002
You can add the special characters to the character class:
@"[^a-z0-9ęąół]"
The regex essentially matches on anything that isn't a-z, 0-9 and whatever other characters you put between the []
- that is the meaning of the ^
at the start.
Upvotes: 4