Jules420
Jules420

Reputation: 71

How to get all links in a div in PHP

I would like to open a page from another website, and extract all the links (href) in a div of class="layout-2-2" in this page. How can I do this using PHP ?

I want to copy every link in layout-2-2 of this webpage.

Here is my actual code

    $doc = file_get_contents('https://url/');
    $xpath = new DOMXpath($doc);
    $liens1= $xpath->query('//div[@class="layout-2-2"]');
    $links = [];
    foreach($liens1 as $lien1) {
      $arr = $lien1->getElementsByTagName("a");
      foreach($arr as $item) {
        $href =  $item->getAttribute("href");
        $text = trim(preg_replace("/[\r\n]+/", " ", $item->nodeValue));
        $links[] = $href;
      }
    }
    echo($links);

Upvotes: 3

Views: 2831

Answers (4)

Ankur Tiwari
Ankur Tiwari

Reputation: 2782

You can use simple foreach to get all the link inside a specific div tag

    //find all a tags that have a href in the div abcde

    $hrefDetails = $html->find('div[@class="layout-1-1"]', 0);
    $linkArray = array();

    foreach($hrefDetails->find('a[href]') as $link){
        array_push($linkArray, $link);
    }

   //print result here

    echo "<pre>";
    print_r($linkArray);
    echo "</pre>";

Upvotes: 1

Nikola Kirincic
Nikola Kirincic

Reputation: 3757

Use xpath query: //div[@class=\"layout-2-2\"]//a/@href to get parent, child nodes, and child node attributes.

$html = file_get_contents('https://url/');
$links = [];
$document = new DOMDocument;
$document ->loadHTML($html);
$xPath = new DOMXPath($document );
$anchorTags = $xPath->evaluate("//div[@class=\"layout-2-2\"]//a/@href");
foreach ($anchorTags  as $anchorTag) {
    $links[] = $anchorTag->nodeValue;
}
print_r($links);

Upvotes: 4

dearsina
dearsina

Reputation: 5210

The code seems fine, but I'm guessing you're experiencing that it doesn't work.

If so, it has probably to do with fact that content nowadays is not stored in the landing page (that you're scraping), but subsequently requested by the page via JavaScript AJAX calls. Thus will not be captured by a simple file_get_contents().

It's kind of like if you go and buy drugs off a drug dealer, he may not have the drugs on him at the time of purchase, but rather calls another person after you've given him money to bring you the goods. Thus, robbing the dealer for drugs, may not yield the results you wanted.

Web scraping, as you're trying to do, is quite an art, and you're probably better off using an off-the-shelf package instead of trying to re-invent the wheel yourself. Even then, many websites protect themselves from what is often attempts at link-theft.

Upvotes: 0

Sinf
Sinf

Reputation: 123

You cannot use the file_get_contents to get contents from external URL for security reasons !

But you can use cURL for this propose, cURL work like a web request to the URL and will return the whole html as a string.

Upvotes: -1

Related Questions