Liming
Liming

Reputation: 1661

Regular expression, find a word between two words

I have this string

<p/><ul><li>test1<p/></li><li>test2<p/></li></ul><p/>

What i attempt to do is extract all the "p" tag within the "li" tag, but not the "p" tag outside of it.

I'm only able so far to extract all the "li" tags by

\<li\>(.*?)\</li\>

I'm lost at how to extract the "p" tag within it.

Any pointer is greatly appreciated it!!

Upvotes: 1

Views: 5526

Answers (3)

James
James

Reputation: 12796

Try this, it uses lookahead so that the LI is not part of the selection.

(?<=<li>)(.*?<p/?>.*?)(?=</li>)

P.S. You also need to fix your HTML because the way you have P tags is not right. The Regex works on this HTML below.

<ul><li><p>test1<p/></li><li><p>test2<p/></li></ul>

Upvotes: 2

Mark Byers
Mark Byers

Reputation: 838216

It is a lot more reliable to use an HTML parser instead of a regex. Use HTML Agility Pack:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("<p/><ul><li>test1<p/></li><li>test2<p/></li></ul><p/>");
IEnumerable<HtmlNode> result = doc.DocumentNode
                                  .Descendants("li")
                                  .SelectMany(x => x.Descendants("p"));

Upvotes: 5

Pindatjuh
Pindatjuh

Reputation: 10526

<li>(.*?<p/?>.*?)</li>

Will match all content between <li> which also contain a <p/>. If you just want to match the <p/> then:

(?<=<li>).*?(<p/?>).*?(?=</li>)

Will have group 1 match the <p/> tag.

Upvotes: 2

Related Questions