PeeHaa
PeeHaa

Reputation: 72729

Parsing a HTML element

I've used DOM before to parse websites in PHP.

I know I should never try to parse HTML using regex.

But... (I don't want to start a shitstorm, just an answer :P )

If i want to parse just 1 HTML element, e.g.

<a href="http://example.com/something?id=1212132131133&filter=true" rel="blebeleble" target="_blank">

And find the content of the href attribute, can I (and probably I need to if I can) use DOM to parse this string or do I need a complete webpage to be able to parse it using the DOM?

Upvotes: 1

Views: 420

Answers (2)

Lightness Races in Orbit
Lightness Races in Orbit

Reputation: 385405

Yes, you can do this.

You have to:

  • pretend that the <a /> tag constitutes the whole document;
  • ensure that you close the tag;
  • ensure that the input string is valid XML (note that I've replaced your & with &amp;, the proper HTML entity).

Code:

<?php
$str = '<a href="http://example.com/something?id=1212132131133&amp;filter=true" rel="blebeleble" target="_blank" />';

$dom = new DOMDocument();
$dom->loadXML($str);
var_dump($dom->childNodes->item(0)->attributes->getNamedItem('href')->value);

// Output: string(57) "http://example.com/something?id=1212132131133&filter=true"
?>

PS, if you want to include the link text, that's ok too:

$str = '<a href="http://example.com/something?id=1212132131133&amp;filter=true" rel="blebeleble" target="_blank">Click here!</a>';
// .. code .. //

// Output: string(57) "http://example.com/something?id=1212132131133&filter=true"

Upvotes: 4

Sajid
Sajid

Reputation: 4421

You can easily adapt a regex to parse just this tag, given you've isolated it. An example can be found here. It's for java, so remember to change the case insensitive modifier to the end!

Upvotes: 0

Related Questions