Tony Basallo
Tony Basallo

Reputation: 3074

Regex Pattern for Whitespace

I am creating a regex library to work with HTML (I'll post it on MSDN Code when it's done). One of the methods removes any whitespace before a closing tag.

<p>See the dog run </p>

It would eliminate the space before the closing paragraph. I am using this:

    public static string RemoveWhiteSpaceBeforeClosingTag(string text)
    {
        string pattern = @"(\s+)(?:</)";
        return Regex.Replace(text, pattern, "</", Singleline | IgnoreCase);
    }

As you can see I am replacing the spaces with </ since I cannot seem to match just the space and exclude the closing tag. I know there's a way - I just haven't figured it out.

Upvotes: 1

Views: 1352

Answers (2)

cletus
cletus

Reputation: 625007

\s+(?=</)

is that expression you're after. It means one or more white-space characters followed by

That all being said, regular expressions are a flaky and error-prone way of processing HTML so should be used with caution if at all.

Upvotes: 11

Daniel Martin
Daniel Martin

Reputation: 23548

You want a lookahead (?=) pattern:

\s+(?=</)

That can be replaced with ""

Upvotes: 3

Related Questions