whoah
whoah

Reputation: 4443

MVC nice urls and special chars in regex

How to edit this this regex Regex.Replace(encodedUrl, @"[^a-z0-9]", "-"); to not delete special characters like ę,ą,ó,ł, etc?

Here is my method. I use it to generate nice URL's, without these chars .,#$%@:; in URL.

    public static string ToSeoUrl(this string url)
    {
        // make the url lowercase
        string encodedUrl = (url ?? "").ToLower();

        // replace & with and
        encodedUrl = Regex.Replace(encodedUrl, @"\&+", "and");

        // remove characters
        encodedUrl = encodedUrl.Replace("'", "");

        // remove invalid characters
        encodedUrl = Regex.Replace(encodedUrl, @"[^a-z0-9]", "-");

        // remove duplicates
        encodedUrl = Regex.Replace(encodedUrl, @"-+", "-");

        // trim leading & trailing characters
        encodedUrl = encodedUrl.Trim('-');

        return encodedUrl;
    }

Regards

Upvotes: 1

Views: 980

Answers (2)

spender
spender

Reputation: 120450

Although this doesn't answer your question directly, the following method for stripping accents, diacritics etc might be handy.

    public static String RemoveAccentsAndDiacritics(this String s)
    {
        return string.Join(string.Empty,
                           s
                               .Normalize(NormalizationForm.FormD)
                               .Where(c => 
                                  CharUnicodeInfo.GetUnicodeCategory(c) != 
                                      UnicodeCategory.NonSpacingMark));
    }

Upvotes: 1

Oded
Oded

Reputation: 499002

You can add the special characters to the character class:

@"[^a-z0-9ęąół]"

The regex essentially matches on anything that isn't a-z, 0-9 and whatever other characters you put between the [] - that is the meaning of the ^ at the start.

Upvotes: 4

Related Questions