Reputation: 2564
I am trying to remove the <br />
tags that appear in between the <pre></pre>
tags. My string looks like
string str = "Test<br/><pre><br/>Test<br/></pre><br/>Test<br/>---<br/>Test<br/><pre><br/>Test<br/></pre><br/>Test"
string temp = "`##`";
while (Regex.IsMatch(result, @"\<pre\>(.*?)\<br\>(.*?)\</pre\>", RegexOptions.IgnoreCase))
{
result = System.Text.RegularExpressions.Regex.Replace(result, @"\<pre\>(.*?)\<br\>(.*?)\</pre\>", "<pre>$1" + temp + "$2</pre>", RegexOptions.IgnoreCase);
}
str = str.Replace(temp, System.Environment.NewLine);
But this replaces all <br>
tags between first and the last <pre>
in the whole text. Thus my final outcome is:
str = "Test<br/><pre>\r\nTest\r\n</pre>\r\nTest\r\n---\r\nTest\r\n<pre>\r\nTest\r\n</pre><br/>Test"
I expect my outcome to be
str = "Test<br/><pre>\r\nTest\r\n</pre><br/>Test<br/>---<br/>Test<br/><pre>\r\nTest\r\n</pre><br/>Test"
Upvotes: 5
Views: 1789
Reputation: 2564
Ok. So I discovered the issue with my code. The problem was that, Regex.IsMatch was considering just the first occurrence of <pre>
and the last occurrence of </pre>
. I wanted to consider individual sets of <pre>
for replacements. So I modified my code as
foreach (Match regExp in Regex.Matches(str, @"\<pre\>(.*?)\<br\>(.*?)\</pre\>", RegexOptions.IgnoreCase))
{
matchFound = true;
str = str.Replace(regExp.Value, regExp.Value.Replace("<br>", temp));
}
and it worked well. Anyways thanks all for your replies.
Upvotes: 0
Reputation: 1851
string input = "Test<br/><pre><br/>Test<br/></pre><br/>Test<br/>---<br/>Test<br/><pre><br/>Test<br/></pre><br/>Test";
string pattern = @"<pre>(.*)<br/>(([^<][^/][^p][^r][^e][^>])*)</pre>";
while (Regex.IsMatch(input, pattern))
{
input = Regex.Replace(input, pattern, "<pre>$1\r\n$2</pre>");
}
this will probably work, but you should use html agility pack, this will not match <br>
or <br />
etc.
Upvotes: 0
Reputation: 498914
If you are parsing whole HTML pages, RegEx is not a good choice - see here for a good demonstration of why.
Use an HTML parser such as the HTML Agility Pack for this kind of work. It also works with fragments like the one you posted.
Upvotes: 3
Reputation: 277
Don't use regex to do it.
"Be lazy, use CPAN and use HTML::Sanitizer." -Jeff Atwood, Parsing Html The Cthulhu Way
Upvotes: 2