zero
zero

Reputation: 1213

retrieve the contents of a div from a external site

Try to retrieve the contents of a div from the external site withg PHP, and XPath

This is an excerpt from the page, showing the relevant code: note: i try to add all - also to add @ on the class and a at the end on my query, After that, i use saveHTML() to get it. see my test:

btw:

this is my XPath:  //*[@id="post-15991"]/div[4]/div[1]
this is the URL: https://wordpress.org/plugins/wp-job-manager/

see the subsequent code:

<?PHP
$url = 'https://wordpress.org/plugins/wp-job-manager/';
$dom = new DOMDocument();
@$dom->loadHTMLFile($url);
$xpath = new DOMXpath($dom);
$elements = $xpath->query('//*[@id="post-15991"]/div[4]/div[1]');
$link = $dom->saveHTML($elements->item(0));
echo $link;
?>

output: But the output is zero....

background:

my way to get the xpath; use google chrome: I have a webpage I want to get some data off:

https://wordpress.org/plugins/wp-job-manager/
https://wordpress.org/plugins/participants-database/
https://wordpress.org/plugins/amazon-link/
https://wordpress.org/plugins/simple-membership/
https://wordpress.org/plugins/scrapeazon/

goal: i need the following data:

Version:
Last updated:
Active installations:
Tested up

see for example the following - view-source:https://wordpress.org/plugins/wp-job-manager/

  • Version: 1.29.3
  • Last updated: 5 days ago
  • Active installations: 100,000+
  •                     <li>
            Requires WordPress Version:<strong>4.3.1</strong>                </li>
    
                        <li>Tested up to: <strong>4.9.2</strong></li>
    

    background: i need the data from all my favorite plugins - want to have it in a db or a calc sheet. So there were approx 70 pages to scrape:_

    see here the list for the example - the full xpath:

    //*[@id="post-15991"]/div[4]/div[1]
    

    and job-board-manager:

    //*[@id="post-519"]/div[4]/div[1]/ul/li[1]
    //*[@id="post-519"]/div[4]/div[1]/ul/li[2]
    //*[@id="post-519"]/div[4]/div[1]/ul/li[3]
    //*[@id="post-519"]/div[4]/div[1]/ul/li[7]
    

    i used this method: Is there a way to get the xpath in google chrome?

    Right click "inspect" on the item you are trying to find the xpath
    Right click on the highlighted area on the console.
    Go to Copy xpath
    

    Upvotes: 0

    Views: 59

    Answers (1)

    Chin Leung
    Chin Leung

    Reputation: 14941

    You are calling .loadHTMLFile which is expecting a file path. If you have your warning options on, you will see the following warnings:

    E_WARNING : type 2 -- DOMDocument::loadHTMLFile(): Attribute class redefined in https://wordpress.org/plugins/wp-job-manager/, line: 190 -- at line 5

    E_WARNING : type 2 -- DOMDocument::loadHTMLFile(): Tag header invalid in https://wordpress.org/plugins/wp-job-manager/, line: 201 -- at line 5

    E_WARNING : type 2 -- DOMDocument::loadHTMLFile(): Tag nav invalid in https://wordpress.org/plugins/wp-job-manager/, line: 205 -- at line 5

    E_WARNING : type 2 -- DOMDocument::loadHTMLFile(): Tag main invalid in https://wordpress.org/plugins/wp-job-manager/, line: 224 -- at line 5

    Instead, use .loadHTML.

    $url = 'https://wordpress.org/plugins/wp-job-manager/';
    $dom = new DOMDocument();
    @$dom->loadHTML($url);
    $xpath = new DOMXpath($dom);
    $elements = $xpath->query('//*[@id="post-15991"]/div[4]/div[1]');
    $link = $dom->saveHTML($elements->item(0));
    echo $link;
    

    And the result would be:

    https://wordpress.org/plugins/wp-job-manager/
    

    Upvotes: 1

    Related Questions