Reputation: 71
I would like to open a page from another website, and extract all the links (href) in a div of class="layout-2-2"
in this page. How can I do this using PHP ?
I want to copy every link in layout-2-2 of this webpage.
Here is my actual code
$doc = file_get_contents('https://url/');
$xpath = new DOMXpath($doc);
$liens1= $xpath->query('//div[@class="layout-2-2"]');
$links = [];
foreach($liens1 as $lien1) {
$arr = $lien1->getElementsByTagName("a");
foreach($arr as $item) {
$href = $item->getAttribute("href");
$text = trim(preg_replace("/[\r\n]+/", " ", $item->nodeValue));
$links[] = $href;
}
}
echo($links);
Upvotes: 3
Views: 2831
Reputation: 2782
You can use simple foreach
to get all the link inside a specific div
tag
//find all a tags that have a href in the div abcde
$hrefDetails = $html->find('div[@class="layout-1-1"]', 0);
$linkArray = array();
foreach($hrefDetails->find('a[href]') as $link){
array_push($linkArray, $link);
}
//print result here
echo "<pre>";
print_r($linkArray);
echo "</pre>";
Upvotes: 1
Reputation: 3757
Use xpath query: //div[@class=\"layout-2-2\"]//a/@href
to get parent, child nodes, and child node attributes.
$html = file_get_contents('https://url/');
$links = [];
$document = new DOMDocument;
$document ->loadHTML($html);
$xPath = new DOMXPath($document );
$anchorTags = $xPath->evaluate("//div[@class=\"layout-2-2\"]//a/@href");
foreach ($anchorTags as $anchorTag) {
$links[] = $anchorTag->nodeValue;
}
print_r($links);
Upvotes: 4
Reputation: 5210
The code seems fine, but I'm guessing you're experiencing that it doesn't work.
If so, it has probably to do with fact that content nowadays is not stored in the landing page (that you're scraping), but subsequently requested by the page via JavaScript AJAX calls. Thus will not be captured by a simple file_get_contents().
It's kind of like if you go and buy drugs off a drug dealer, he may not have the drugs on him at the time of purchase, but rather calls another person after you've given him money to bring you the goods. Thus, robbing the dealer for drugs, may not yield the results you wanted.
Web scraping, as you're trying to do, is quite an art, and you're probably better off using an off-the-shelf package instead of trying to re-invent the wheel yourself. Even then, many websites protect themselves from what is often attempts at link-theft.
Upvotes: 0
Reputation: 123
You cannot use the file_get_contents to get contents from external URL for security reasons !
But you can use cURL for this propose, cURL work like a web request to the URL and will return the whole html as a string.
Upvotes: -1