Empty Array
Empty Array

Reputation: 3

loading external div with php

I was trying to load a page from H&M (for studying purposes), when I noticed that the content of one div isn't loaded, but if I save the page from the browser, the div is saved correctly. Can anyone explain me why this happens?
The div (and most important, ist's contents) I'm looking for is:
body>div main>div content> div relatedInformationContainer
(inside there's lot of content: div relatedInformation>etc...)
this is the code i used:

<?php
$url = "http://www.hm.com/gb/product/05427";
libxml_use_internal_errors(true);
$html = file_get_contents($url);  
$dom = new DomDocument();  
$dom->loadHTML($html);  
$xp = new domxpath($dom);

$contentDivs = $xp->query('//div[@id="content"]')->item(0);
$numContentDivs = $xp->evaluate('count(div)', $contentDivs);
// echo $numContentDivs; // output:3 (correct)
$relatedDiv = $xp->query('//div[@id="content"]/div[2]')->item(0)->getAttribute("id");
echo $relatedDiv; // output:relatedInformationContainer (correct)
$relatedDivContent = $xp->query('//div[@id="content"]/div[2]')->item(0);
$numRelatedDivContent = $xp->evaluate('count(div)', $relatedDivContent);
echo $numRelatedDivContent; // output:0 (incorrect!!! it should output 1)
?>

I used more simple methods, same result:

<?php
$url = "http://www.hm.com/gb/product/05427";
$doc = new DOMDocument();
$load = @$doc->loadHTMLFile($url);
echo $doc->saveHTML();
?>

I would apreciate if anyone could explain me why this happens, and if there's a solution. Thanks.

Upvotes: 0

Views: 112

Answers (1)

LSerni
LSerni

Reputation: 57398

The DIV is loaded from Javascript. You need to retrieve what the Javascript call is, and replicate that in PHP.

Using Firefox with Firebug, I see that the page issues a call to

http://www.hm.com/gb/product/05427/05427-A/related

which returns the DIV with all its contents (I guess it replaces the DIV). You will have to capture that.

Also, some servers check who is asking what and on behalf of whom. So the query above might not work if its HTTP_REFERER field is not set to the correct originating page, with the right User-Agent and session cookies etc. (in general; it appears not to be the case here - even though I may be wrong).

Upvotes: 1

Related Questions