Getting an list which contains specific text using regex

Question

I am trying to get the list of "ul" which contains the term "[My search Text]" inside it.

I have tried using the below regex but its not returning me the proper output,

]*>\s*?\w+?(.|\n).*($$My search Text$$).*(.|\n).+

Input :

[My search Text] is required  
[My edit Text] is not required 
[My search Text] is mandatory  
    
[My search Text] is so mandatory

Desired Output :

[My search Text] is required    
[My search Text] is mandatory  
    
[My search Text] is so mandatory

Thanks in advance

Wiktor Stribiżew · Accepted Answer

A note on your regex:

]*> - should work OK,
\s*? - no need to use a lazy quantifier
\w+? - same, no need in lazy matching,
(.|\n) - this makes no sense since it matches any symbol once
.* - 0 or more characters other than a newline as many as possible
($$My search Text$$) - a literal [My search Text]
.*(.|\n) - same as above
.+ - 1 or more characters other than a newline

- literal .

You can see that in this regex you do not really have a good multiline support. It is very inefficient due to lots of .* that require lots of backtracking.

I would install the HtmlAgilityPack and use the following method:

public List HtmlAgilityPackGetTagOuterHTMLbyXpath(string html, string xpath)
{
    HtmlAgilityPack.HtmlDocument hap;
    var results = new List();
    Uri uriResult;
    if (Uri.TryCreate(html, UriKind.Absolute, out uriResult) && uriResult.Scheme == Uri.UriSchemeHttp)
    { // html is a URL 
        var doc = new HtmlAgilityPack.HtmlWeb();
        hap = doc.Load(uriResult.AbsoluteUri);
    }
    else
    { // html is a string
        hap = new HtmlAgilityPack.HtmlDocument();
        hap.LoadHtml(html);
    }
    var nodes = hap.DocumentNode.SelectNodes(xpath);
    if (nodes != null)
    {
       foreach (var node in nodes)
           results.Add(node.OuterHtml);
    }
    return results;
}

With one of these 2 XPaths that should return you 3

//li[contains(., 'My search Text')]/ancestor::ul[1]
//ul[.//li[contains(., 'My search Text')]]

Like this:

var res = HtmlAgilityPackGetTagOuterHTMLbyXpath(s, "//li[contains(., 'My search Text')]/ancestor::ul[1]"");

Getting an list which contains specific text using regex

Input :

Desired Output :

Answers (2)

Related Questions