retrieving certain attributes using DOMDocument

Question

I'm trying to figure out how parse an html page to get a forms action value, the labels within the form tab as well as the input field names. I took at look at php.net Domdocument and it tells me to get a childnode but all that does is give me errors that it doesnt exist. I also tried doing print_r of the variable holding the html content and all that shows me is length=1. Can someone show me a few samples that i can use because php.net is confusing to follow.

preserveWhiteSpace = FALSE;
$dom->loadHTML($content);

$form = $dom->getElementsByTagName('form');

print_r($form);

FuzzyTree · Accepted Answer

I suggest using DomXPath instead of getElementsByTagName because it allows you to select attribute values directly and returns a DOMNodeList object just like getElementsByTagName. The @ in @action indicates that we're selecting by attribute.

$doc = new DOMDocument();
$doc->loadHTML($content);
$xpath = new DomXPath($doc);
$action = $xpath->query('//form/@action')->item(0);
var_dump($action);

Similarly, to get the first input

$action = $xpath->query('//form/input')->item(0);

To get all input fields

for($i=0;$i<$xpath->query('//form/input')->length;$i++) {
    $label = $xpath->query('//form/input')->item($i);
    var_dump($label);       
}

If you're not familiar with XPath, I recommend viewing these examples.

retrieving certain attributes using DOMDocument

Answers (1)

Related Questions