Reputation: 37
I am trying to get the list of "ul" which contains the term "[My search Text]" inside it.
I have tried using the below regex but its not returning me the proper output,
<ul[^>]*>\s*?\w+?(.|\n).*(\[My search Text\]).*(.|\n).+</ul>
<ul><li>[My search Text] is required </li></ul>
<ul><li>[My edit Text] is not required </li></ul>
<ul><li><b>[My search Text] is mandatory </b> </li> </ul>
<ul><li><strong>[My search Text] is so mandatory </strong> </li></ul>
<ul><li>[My search Text] is required </li></ul>
<ul><li><b>[My search Text] is mandatory </b> </li> </ul>
<ul><li><strong>[My search Text] is so mandatory </strong> </li></ul>
Thanks in advance
Upvotes: 1
Views: 226
Reputation: 3326
Try:(for text inside ui)
<ul>*.+(\[My search Text\]).+</ul>
for text inside li:
<ul>*.<li>*.+(\[My search Text\]).+<\/li>*.*<\/ul>
Upvotes: 0
Reputation: 626802
A note on your regex:
<ul[^>]*>
- should work OK,\s*?
- no need to use a lazy quantifier\w+?
- same, no need in lazy matching, (.|\n)
- this makes no sense since it matches any symbol once.*
- 0 or more characters other than a newline as many as possible(\[My search Text\])
- a literal [My search Text]
.*(.|\n)
- same as above.+
- 1 or more characters other than a newline</ul>
- literal </ul>
.You can see that in this regex you do not really have a good multiline support. It is very inefficient due to lots of .*
that require lots of backtracking.
I would install the HtmlAgilityPack and use the following method:
public List<string> HtmlAgilityPackGetTagOuterHTMLbyXpath(string html, string xpath)
{
HtmlAgilityPack.HtmlDocument hap;
var results = new List<string>();
Uri uriResult;
if (Uri.TryCreate(html, UriKind.Absolute, out uriResult) && uriResult.Scheme == Uri.UriSchemeHttp)
{ // html is a URL
var doc = new HtmlAgilityPack.HtmlWeb();
hap = doc.Load(uriResult.AbsoluteUri);
}
else
{ // html is a string
hap = new HtmlAgilityPack.HtmlDocument();
hap.LoadHtml(html);
}
var nodes = hap.DocumentNode.SelectNodes(xpath);
if (nodes != null)
{
foreach (var node in nodes)
results.Add(node.OuterHtml);
}
return results;
}
With one of these 2 XPaths that should return you 3 <ul>
nodes:
//li[contains(., 'My search Text')]/ancestor::ul[1]
//ul[.//li[contains(., 'My search Text')]]
Like this:
var res = HtmlAgilityPackGetTagOuterHTMLbyXpath(s, "//li[contains(., 'My search Text')]/ancestor::ul[1]"");
Upvotes: 1