Reputation: 6386
I'm trying to figure out how parse an html page to get a forms action value, the labels within the form tab as well as the input field names. I took at look at php.net Domdocument and it tells me to get a childnode but all that does is give me errors that it doesnt exist. I also tried doing print_r of the variable holding the html content and all that shows me is length=1. Can someone show me a few samples that i can use because php.net is confusing to follow.
<?php
$content = "some-html-source";
$content = preg_replace("/&(?!(?:apos|quot|[gl]t|amp);|#)/", '&', $content);
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadHTML($content);
$form = $dom->getElementsByTagName('form');
print_r($form);
Upvotes: 1
Views: 50
Reputation: 32392
I suggest using DomXPath
instead of getElementsByTagName
because it allows you to select attribute values directly and returns a DOMNodeList
object just like getElementsByTagName
. The @
in @action
indicates that we're selecting by attribute.
$doc = new DOMDocument();
$doc->loadHTML($content);
$xpath = new DomXPath($doc);
$action = $xpath->query('//form/@action')->item(0);
var_dump($action);
Similarly, to get the first input
$action = $xpath->query('//form/input')->item(0);
To get all input
fields
for($i=0;$i<$xpath->query('//form/input')->length;$i++) {
$label = $xpath->query('//form/input')->item($i);
var_dump($label);
}
If you're not familiar with XPath, I recommend viewing these examples.
Upvotes: 1