Reputation: 3074
I am creating a regex library to work with HTML (I'll post it on MSDN Code when it's done). One of the methods removes any whitespace before a closing tag.
<p>See the dog run </p>
It would eliminate the space before the closing paragraph. I am using this:
public static string RemoveWhiteSpaceBeforeClosingTag(string text)
{
string pattern = @"(\s+)(?:</)";
return Regex.Replace(text, pattern, "</", Singleline | IgnoreCase);
}
As you can see I am replacing the spaces with </ since I cannot seem to match just the space and exclude the closing tag. I know there's a way - I just haven't figured it out.
Upvotes: 1
Views: 1352
Reputation: 625007
\s+(?=</)
is that expression you're after. It means one or more white-space characters followed by
(?=...)
is a positive lookahead. This won't be included in the expression;(?:...)
is a non-capturing group. This will be included in the expression.That all being said, regular expressions are a flaky and error-prone way of processing HTML so should be used with caution if at all.
Upvotes: 11
Reputation: 23548
You want a lookahead (?=) pattern:
\s+(?=</)
That can be replaced with ""
Upvotes: 3