Reputation: 33
I have the following html, i tried many many regex to remove hperlink content/text that is between ul tag and li tag only, but not found any regex for removing a tag text . I want that , whenever a tag comes under in ul and li tag then i want to replace a tag text with empty string.
<ul id="foot.dir" class="content" >
<li><a href="http://www.citysearch.com/aboutcitysearch/about_us" name="search_grid.footer.1.aboutCs" rel="nofollow" id="foot.dir.about">About</a></li>
<li><a href="http://www.citysearch.com/mobile-application" name="search_grid.footer.1.mobile" id="foot.dir.apps">Apps</a></li>
</ul>
i have tried this regex but it is not working, here input is string that contains html.
input = Regex.Replace(input, @"<ul[^>]*?><li><a[^>]*?>(?<option>.*?)</ul></li></a>", string.Empty);
Please help me out. Thank You
Upvotes: 1
Views: 1824
Reputation: 32807
Regex is not a good choice for parsing HTML files..
HTML is not strict nor is it regular with its format..
Use htmlagilitypack
Regex is used for Regular expression
You can use this code to retrieve it using HtmlAgilityPack
HtmlDocument doc = new HtmlDocument();
doc.Load(yourStream);
foreach(var item in doc.DocumentNode.SelectNodes("//li[a]"))// select li only if it has anchor tag
{
item.ParentNode.RemoveChild(item);//removed anchor tag
}
//dont forget to save
i want to remove tag text using regex only ..
Regex.Replace(input,@"(?<=<li[^>]*>)\s*<a.*?(?=</li>)","",RegexOptions.Singleline);
Upvotes: 1
Reputation: 499062
Regex
is a poor choice for parsing HTML, in particular HTML that is not consistent.
I suggest using the HTML Agility Pack to parse and change the HTML.
What is exactly the Html Agility Pack (HAP)?
This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).
The source download comes with a number of sample projects showing how to use the library.
Upvotes: 2