Michael
Michael

Reputation: 5675

Regular expressions help

If I had the following HTML:

<li><a href="aaa"> Thisislink1</a></li>
<li><a href="abcdef"> Thisisanotherlink</a></li>
<li><a href="12345"> Onemorelink</a></li>

Where each link will be different in length and value.

How can I search for the values inside the link (IE: Thisislink1, Thisisanotherlink and Onemorelink) with a search phrase, say 'another'. So in this example, only 'Thisisanotherlink' would be returned, but if I changed the search phrase to 'link', then all 3 values will be returned.

Upvotes: 0

Views: 92

Answers (3)

Williham Totland
Williham Totland

Reputation: 29039

This needs to be done in two passes:

  1. Extract the text from all links in the document. XSL or XPath should we workable for this purpose. As you extract text, keep a copy of the DOM around so you can attach information to it and the text, telling you where the text is extracted from (if you are going to need this info later, you might not). As an alternative, just keep attach the contents of the href attribute to the text.

    Be sure to extract all the text you need (e.g. title attributes, or alt text of <a href><img alt></a> type constructs.

  2. Search the extracted text for the phrase you are looking for.

  3. (Optional) use the information you set earlier to map back to the DOM to figure out what element you gathered the text from, and highlight it. If you extracted the href attribute, you could just make a new link using this and the matching text.

Upvotes: 0

Mark Byers
Mark Byers

Reputation: 839224

Don't use regex. Use DOMDocument.

Upvotes: 2

mkorpela
mkorpela

Reputation: 4395

/\w*another\w*/

Upvotes: 0

Related Questions