Reputation: 1661
I have this string
<p/><ul><li>test1<p/></li><li>test2<p/></li></ul><p/>
What i attempt to do is extract all the "p" tag within the "li" tag, but not the "p" tag outside of it.
I'm only able so far to extract all the "li" tags by
\<li\>(.*?)\</li\>
I'm lost at how to extract the "p" tag within it.
Any pointer is greatly appreciated it!!
Upvotes: 1
Views: 5526
Reputation: 12796
Try this, it uses lookahead so that the LI is not part of the selection.
(?<=<li>)(.*?<p/?>.*?)(?=</li>)
P.S. You also need to fix your HTML because the way you have P tags is not right. The Regex works on this HTML below.
<ul><li><p>test1<p/></li><li><p>test2<p/></li></ul>
Upvotes: 2
Reputation: 838216
It is a lot more reliable to use an HTML parser instead of a regex. Use HTML Agility Pack:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("<p/><ul><li>test1<p/></li><li>test2<p/></li></ul><p/>");
IEnumerable<HtmlNode> result = doc.DocumentNode
.Descendants("li")
.SelectMany(x => x.Descendants("p"));
Upvotes: 5
Reputation: 10526
<li>(.*?<p/?>.*?)</li>
Will match all content between <li>
which also contain a <p/>
. If you just want to match the <p/>
then:
(?<=<li>).*?(<p/?>).*?(?=</li>)
Will have group 1 match the <p/>
tag.
Upvotes: 2