Reputation: 11
I'm very new to PHP writing and regular expressions. I need to write a Regex pattern that will allow me to "grab" the headlines in the following html tags:
<title>My news</title>
<h1>News</h1>
<h2 class=\"yiv1801001177first\">This is my first headline</h2>
<p>This is a summary of a fascinating article.</p>
<h2>This is another headline</h2>
<p>This is a summary of a fascinating article.</p>
<h2>This is the third headline</h2>
<p>This is a summary of a fascinating article.</p>
<h2>This is the last headline</h2>
<p>This is a summary of a fascinating article.</p>
So I need a pattern to match all the <h2> tags. This is my first attempt at writing a pattern, and I'm seriously struggling...
/(<h+[2])>(.*?)\<\/h2>/ is what I've attempted. Help is much appreciated!
Upvotes: 0
Views: 1542
Reputation: 336108
The easiest way to do it via regex is
#<h2\b[^>]*>(.*?)</h2>#is
This will match any h2
tag and capture its contents in backreference $1
. I've used #
as a regex delimiter to avoid escaping the /
later on in the regex, and the is
options to make the regex case-insensitive and to allow newlines within the tag's contents.
There are circumstances where this regex will fail, though, as pointed out correctly by others in this thread.
Upvotes: 1
Reputation: 481
I have only checked in RegexBuddy, there following regex works:
<h2.*</h2>
Upvotes: 0
Reputation: 4176
I'm not too familiar with PHP, but in cases like this it's usually easier to use XML parser (which will automatically detect <h2> as well as <h2 class="whatever"> rather than regex, which you'll have to add a bunch of special cases to. Javascript, for example has XML DOM exactly for this purpose, I'd be surprised if PHP didn't have something similar.
Upvotes: 1