Brad
Brad

Reputation: 163430

RegEx for a specific word within HTML

I am new to RegEx and haven't been able to figure out what is likely a simple problem. I need to match a list of specific words within a block of HTML.

For example, I have a list of words:

And the following HTML:

<p>ASDF jumped over the ZXCV of QWER.</p>

I am using preg_replace_callback() with array of RegEx to match, such as /\bASDF\b/, but that will only find ASDF surrounded by spaces, and doesn't take into account symbols, such as the beginning/end of tags or punctuation.

I've been staring at RegEx sheets for hours, and am stuck on this one. Any advice you could give me to get started would be most appreciated. Thank you for your time.

Upvotes: 0

Views: 244

Answers (3)

nickytonline
nickytonline

Reputation: 6981

Do you want to match any of those words or all of them? If it's any, you can just do (ASDF|ZXCV|QWER). If it's all those words, what's the criteria for matching all words?

Check out this resource, http://www.regular-expressions.info and I strongly recommend that you pick up a copy of this book, Mastering Regular Expressions, by Jeffrey Friedl, http://regex.info.

Upvotes: 1

alex
alex

Reputation: 490403

You want to take the HTML tags out of the equation and only work with text nodes.

Therefore, strip away the HTML or use something like DOMDocument to parse the elements, and then use the regex on the text nodes.

Also, \b should consider > a boundary because it is not a word character.

Upvotes: 1

Seth Robertson
Seth Robertson

Reputation: 31461

\bASDF\b

Will match

<p>ASDF</p>
<p>foo ASDF bar</p>
<p>&nbsp;ASDF&gt;</p>
<p>foo ASDF.</p>

What are you having trouble not matching?

Upvotes: 1

Related Questions