Sanjay Khatri
Sanjay Khatri

Reputation: 4211

regular expression anchor tag

i am using php and i am having problem to parse the href from anchor tag with text.

example: anchor tag having test http://www.test.com

like this <a href="http://www.test.com" title="test">http://www.test.com</a>

i want to match all text in anchor tag

thanks in advance.

Upvotes: 1

Views: 3289

Answers (3)

Daniel Egeberg
Daniel Egeberg

Reputation: 8382

Use DOM:

$text = '<a href="http://www.test.com" title="test">http://www.test.com</a> something else hello world';
$dom = new DOMDocument();
$dom->loadHTML($text);

foreach ($dom->getElementsByTagName('a') as $a) {
    echo $a->textContent;
}

DOM is specifically designed to parse XML and HTML. It will be more robust than any regex solution you can come up with.

Upvotes: 6

Recurse
Recurse

Reputation: 3585

If you have already obtained the anchor tag you can extract the href attribute via a regex easily enough:

<a [^>]*href="([^"])"[^>]*>

If you instead want to extract the contents of the tag and you know what you are doing, it isn't too hard to write a simple recursive descent parser, using cascading regexes, that will parse all but the most pathological cases. Unfortunately PHP isn't a good language to learn how to do this, so I wouldn't recommend using this project to learn how.

So if it is the contents you are after, not the attribute, then @katrielalex is right: don't parse HTML with regex. You will run into a world of hurt with nested formatting tags and other legal HTML that isn't compatible with regular expressions.

Upvotes: -1

Peter O&#39;Callaghan
Peter O&#39;Callaghan

Reputation: 6186

Assuming you wish to select the link text of an anchor link with that href, then something like this should work...

$input = '<a href="http://www.test.com" title="test">http://www.test.com</a>';
$pattern = '#<a href="http://www\.test\.com"[^>]*>(.*?)</a>#';

if (preg_match($pattern, $input, $out)) {
    echo $out[1];
}

This is technically not perfect (in theory > can probably be used in one of the tags), but will work in 99% of cases. As several of the comments have mentioned though, you should be using a DOM.

Upvotes: -1

Related Questions