Shishant
Shishant

Reputation: 9294

Parsing HTML Source to extract Anchor and Link tags href value

I am looking for some HTML Parser in PHP which can help me extract href values from the html source.

I looked at phpQuery and its best but it is to be too overkill for my needs and cosume a lot of CPU doing the extra stuff that I dont need.

I also checked

$dom = new DomDocument();
$dom->loadHTML($html);

but it has problems parsing HTML5 tags.

Is there any better library/class or a way to do it?

Upvotes: 0

Views: 4947

Answers (3)

Pramod Kumar
Pramod Kumar

Reputation: 99

I used this - -

$html = '<a href="http://google.com"><img src="images/a.png" /></a>';
preg_match('/href="([^\s"]+)/', $html, $match);
echo '<pre>';
print_r($match);

Upvotes: 0

Nick
Nick

Reputation: 783

simplehtmldom is a handy PHP HTML parsing class

http://simplehtmldom.sourceforge.net/

Upvotes: 0

Michael McTiernan
Michael McTiernan

Reputation: 5313

Well, you can use regular expressions to extract the data:

$html = "This is some stuff right here. <a href='index.html'>Check this out!</a> <a href=herp.html>And this is another thing!</a> <a href=\"derp.html\">OH MY GOSH</a>";
preg_match_all('/href=[\'"]?([^\s\>\'"]*)[\'"\>]/', $html, $matches);
$hrefs = ($matches[1] ? $matches[1] : false);
print_r($hrefs);

Upvotes: 1

Related Questions