Reputation: 11
I want to remove the tag and contents between them from my source..
Following is my source:
<tr>
<td class="ds_label" width="40%" style="font-size: 70%;"></td>
<td id="table_cell_1585" class="ds_label">
<a class="tt" href="#" onClick="return false;">
<table class="tooltip" style="width:300px;" cellpadding="0" cellspacing="0" border=0>
</a>
</td>
<td class="ds_data" width="60%" style="font-size: 70%">800 x 480 pixels</td>
</tr>
And i want to remove whole <a>
tag with content.
I used this: response contains my source code.
response = Regex.Replace(response, "<a>(.|\n)*?</a>", string.Empty);
but it's not working.
Please advise.
Upvotes: 1
Views: 3784
Reputation: 6136
Firstly, try to avoid using regex to work with HTML, it's the wrong tool because there are too many edge cases to be reliable or secure. Use a framework designed to work with a structured document like the HTMLAgilityPack.
When you are using literal strings to define a regular expression in c# it's a good idea to use a verbatim string literal (prefixed with @
) so escape caracters in the pattern arn't interpreted as part of the literal string. In the case of this question @"<a>(.|\n)*?</a>"
will stop the \n
from being treated as an escape character in c#.
New lines can consist of both \r
and-or \n
HTML A
tags contain attributes like href so <a>
is unlikely to match anything because of the closing >
Use RegexOptions.Singleline
in the options argument to ensure .
matches any character including newlines.
This unit test succeeds.
[Test]
public void Test()
{
Regex pattern = new Regex(@"<a.*?</a>", RegexOptions.Singleline);
string input = "foo <a href=\"//example.com\">\r\nbaz</a> bar";
string expected = "foo bar";
string actual = pattern.Replace(input, string.Empty);
Assert.AreEqual(expected, actual);
}
However, be aware that this is not a secure way of handling user input or any kind of data that is not pre-defined because regular expressions like this can easily be evaded.
Upvotes: 2
Reputation: 12528
Try this regex :
<a\b[^>]*>(.*?)</a>
[TestMethod]
public void TestMethod1()
{
var source =
@"
<tr>
<td class='ds_label' width='40%' style='font-size: 70%;\'></td>
<td id='table_cell_1585' class='ds_label'>
<a class='tt' href='#' onClick='return false;'>
<table class='tooltip' style='width:300px;' cellpadding='0' cellspacing='0' border=0>
</a>
</td>
<td class='ds_data' width='60%' style='font-size: 70%'>800 x 480 pixels</td>
</tr>";
source = Regex.Replace(source, "<a [^>]*>", string.Empty);
source = Regex.Replace(source, "</a>", string.Empty);
Console.Write(source);
}
Upvotes: 0
Reputation: 6554
Regex is not a good tool for parsing HTML. Take a look at HTMLAgilityPack instead to save yourself some work.
Upvotes: 2