Reputation: 3394
I have the following html and i am using php's DomDocument
class to get the element with id 'nextPageBtn' next to the script tag. the problem is my query doesnot return anything (as if there is no element with the specified id). heres the html i am parsing.
<body>
<div style='float:left'><img src='../../../../includes/ph1.jpg'></div>
<label style='width: 476px; height: 40px; position: absolute;top:100px; left: 40px; z-index: 2; background-color: rgb(255, 255, 255);; background-color: transparent' >
<font size="4">1a. Nice to meet you!</font>
</label>
<img src='ENG_L1_C1_P0_1.jpg' style='width: 700px; height: 540px; position: absolute;top:140px; left: 40px; z-index: 1;' />
<script type='text/javascript'>
swfobject.registerObject('FlashID');
</script>
<input type="image" id="nextPageBtn" src="../../../../includes/ph4.gif" style="position: absolute; top: 40px; left: 795px; ">
</body>
and heres the php code to parse it.
$doc->loadHTMLFile($path);
$doc->encoding='UTF-8';
$x = new DOMXPath($doc);
$nextPage=$x->query("//*[@id='nextPageBtn']")->item(0);
if($nextPage)
{
echo 'found it..';
}
I think the line 'swfobject.registerObject('FlashID')' is generating some kind of error which is avoiding the element to be found?
Upvotes: 1
Views: 341
Reputation: 197767
As written in the comment, your code just works flawlessly. Demo: http://codepad.viper-7.com/RUNGOd
What you consider a source of problem:
I think the line 'swfobject.registerObject('FlashID')' is generating some kind of error which is avoiding the element to be found?
Hardly can be one as DOMDocument::loadHTMLFile
should deal with all tags (otherwise you would have recieved errors/warnings in loading the document. After loading has been done, DOMDocument
has normalized data accessible, so there aren't such issues (if there isn't a bug in libxml, the underlying library, but there hardly is for such a general thing).
So what are the options here? Probably the HTML is not the HTML you think of. That could be if loading the HTML fails in your case. Check for errors while loading:
error_reporting(~0); ini_set('display_errors', 1);
Also validate that the HTML is the HTML you think after loading:
$doc->loadHTMLFile($path);
echo $doc->saveHTML();
which will output the "source".
Also check your LIBXML version:
printf("LIBXML version: %s\n", LIBXML_DOTTED_VERSION);
LIBXML is the underlying library PHP's DOMDocument
is based on. Depending on the version there can be bugs and not all features are working. For example the getElementById
function doesn't work with loadHTMLFile
/loadHTML
with version 2.6.26
but it does with version 2.7.7
(the XPath expression you're using is not affected with these two versions).
If you're running into an encoding issue here (the source file has some other encoding than expected), it's harder to tell with the information you've provided. Internally DOMDocument
's default encoding is UTF-8 in PHP, so setting:
$doc->encoding='UTF-8';
after you've loaded the file looks superfluous to me. Maybe you should just remove this to reduce the code to easier find a place the error comes from (as I did in the demo).
Upvotes: 1