Parsing a HTML element

Question

I've used DOM before to parse websites in PHP.

I know I should never try to parse HTML using regex.

But... (I don't want to start a shitstorm, just an answer :P )

If i want to parse just 1 HTML element, e.g.

And find the content of the href attribute, can I (and probably I need to if I can) use DOM to parse this string or do I need a complete webpage to be able to parse it using the DOM?

Lightness Races in Orbit · Accepted Answer

Yes, you can do this.

You have to:

pretend that the tag constitutes the whole document;

ensure that you close the tag;

ensure that the input string is valid XML (note that I've replaced your & with &, the proper HTML entity).

Code:

';

$dom = new DOMDocument();
$dom->loadXML($str);
var_dump($dom->childNodes->item(0)->attributes->getNamedItem('href')->value);

// Output: string(57) "http://example.com/something?id=1212132131133&filter=true"
?>

PS, if you want to include the link text, that's ok too:

$str = 'Click here!';
// .. code .. //

// Output: string(57) "http://example.com/something?id=1212132131133&filter=true"

Parsing a HTML element

Answers (2)

Related Questions