Exploit
Exploit

Reputation: 6386

retrieving certain attributes using DOMDocument

I'm trying to figure out how parse an html page to get a forms action value, the labels within the form tab as well as the input field names. I took at look at php.net Domdocument and it tells me to get a childnode but all that does is give me errors that it doesnt exist. I also tried doing print_r of the variable holding the html content and all that shows me is length=1. Can someone show me a few samples that i can use because php.net is confusing to follow.

<?php

$content = "some-html-source";
$content = preg_replace("/&(?!(?:apos|quot|[gl]t|amp);|#)/", '&amp;', $content);

$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadHTML($content);

$form = $dom->getElementsByTagName('form');

print_r($form);

Upvotes: 1

Views: 50

Answers (1)

FuzzyTree
FuzzyTree

Reputation: 32392

I suggest using DomXPath instead of getElementsByTagName because it allows you to select attribute values directly and returns a DOMNodeList object just like getElementsByTagName. The @ in @action indicates that we're selecting by attribute.

$doc = new DOMDocument();
$doc->loadHTML($content);
$xpath = new DomXPath($doc);
$action = $xpath->query('//form/@action')->item(0);
var_dump($action);

Similarly, to get the first input

$action = $xpath->query('//form/input')->item(0);

To get all input fields

for($i=0;$i<$xpath->query('//form/input')->length;$i++) {
    $label = $xpath->query('//form/input')->item($i);
    var_dump($label);       
}

If you're not familiar with XPath, I recommend viewing these examples.

Upvotes: 1

Related Questions