Jo W
Jo W

Reputation: 11

Writing Regex pattern for HTML tags

I'm very new to PHP writing and regular expressions. I need to write a Regex pattern that will allow me to "grab" the headlines in the following html tags:

<title>My news</title>
<h1>News</h1>

<h2 class=\"yiv1801001177first\">This is my first headline</h2>
<p>This is a summary of a fascinating article.</p>

<h2>This is another headline</h2>
<p>This is a summary of a fascinating article.</p>

<h2>This is the third headline</h2>
<p>This is a summary of a fascinating article.</p>

<h2>This is the last headline</h2>
<p>This is a summary of a fascinating article.</p>

So I need a pattern to match all the <h2> tags. This is my first attempt at writing a pattern, and I'm seriously struggling...
/(<h+[2])>(.*?)\<\/h2>/ is what I've attempted. Help is much appreciated!

Upvotes: 0

Views: 1542

Answers (3)

Tim Pietzcker
Tim Pietzcker

Reputation: 336108

The easiest way to do it via regex is

#<h2\b[^>]*>(.*?)</h2>#is

This will match any h2 tag and capture its contents in backreference $1. I've used # as a regex delimiter to avoid escaping the / later on in the regex, and the is options to make the regex case-insensitive and to allow newlines within the tag's contents.

There are circumstances where this regex will fail, though, as pointed out correctly by others in this thread.

Upvotes: 1

kklobucki
kklobucki

Reputation: 481

I have only checked in RegexBuddy, there following regex works:

<h2.*</h2>

Upvotes: 0

Alexander Tsepkov
Alexander Tsepkov

Reputation: 4176

I'm not too familiar with PHP, but in cases like this it's usually easier to use XML parser (which will automatically detect <h2> as well as <h2 class="whatever"> rather than regex, which you'll have to add a bunch of special cases to. Javascript, for example has XML DOM exactly for this purpose, I'd be surprised if PHP didn't have something similar.

Upvotes: 1

Related Questions