Reputation: 1213
Try to retrieve the contents of a div from the external site withg PHP, and XPath
This is an excerpt from the page, showing the relevant code: note: i try to add all - also to add @ on the class and a at the end on my query, After that, i use saveHTML() to get it. see my test:
btw:
this is my XPath: //*[@id="post-15991"]/div[4]/div[1]
this is the URL: https://wordpress.org/plugins/wp-job-manager/
see the subsequent code:
<?PHP
$url = 'https://wordpress.org/plugins/wp-job-manager/';
$dom = new DOMDocument();
@$dom->loadHTMLFile($url);
$xpath = new DOMXpath($dom);
$elements = $xpath->query('//*[@id="post-15991"]/div[4]/div[1]');
$link = $dom->saveHTML($elements->item(0));
echo $link;
?>
output: But the output is zero....
background:
my way to get the xpath; use google chrome: I have a webpage I want to get some data off:
https://wordpress.org/plugins/wp-job-manager/
https://wordpress.org/plugins/participants-database/
https://wordpress.org/plugins/amazon-link/
https://wordpress.org/plugins/simple-membership/
https://wordpress.org/plugins/scrapeazon/
goal: i need the following data:
Version:
Last updated:
Active installations:
Tested up
see for example the following - view-source:https://wordpress.org/plugins/wp-job-manager/
<li>
Requires WordPress Version:<strong>4.3.1</strong> </li>
<li>Tested up to: <strong>4.9.2</strong></li>
background: i need the data from all my favorite plugins - want to have it in a db or a calc sheet. So there were approx 70 pages to scrape:_
see here the list for the example - the full xpath:
//*[@id="post-15991"]/div[4]/div[1]
and job-board-manager:
//*[@id="post-519"]/div[4]/div[1]/ul/li[1]
//*[@id="post-519"]/div[4]/div[1]/ul/li[2]
//*[@id="post-519"]/div[4]/div[1]/ul/li[3]
//*[@id="post-519"]/div[4]/div[1]/ul/li[7]
i used this method: Is there a way to get the xpath in google chrome?
Right click "inspect" on the item you are trying to find the xpath
Right click on the highlighted area on the console.
Go to Copy xpath
Upvotes: 0
Views: 59
Reputation: 14941
You are calling .loadHTMLFile
which is expecting a file path. If you have your warning options on, you will see the following warnings:
E_WARNING : type 2 -- DOMDocument::loadHTMLFile(): Attribute class redefined in https://wordpress.org/plugins/wp-job-manager/, line: 190 -- at line 5
E_WARNING : type 2 -- DOMDocument::loadHTMLFile(): Tag header invalid in https://wordpress.org/plugins/wp-job-manager/, line: 201 -- at line 5
E_WARNING : type 2 -- DOMDocument::loadHTMLFile(): Tag nav invalid in https://wordpress.org/plugins/wp-job-manager/, line: 205 -- at line 5
E_WARNING : type 2 -- DOMDocument::loadHTMLFile(): Tag main invalid in https://wordpress.org/plugins/wp-job-manager/, line: 224 -- at line 5
Instead, use .loadHTML
.
$url = 'https://wordpress.org/plugins/wp-job-manager/';
$dom = new DOMDocument();
@$dom->loadHTML($url);
$xpath = new DOMXpath($dom);
$elements = $xpath->query('//*[@id="post-15991"]/div[4]/div[1]');
$link = $dom->saveHTML($elements->item(0));
echo $link;
And the result would be:
https://wordpress.org/plugins/wp-job-manager/
Upvotes: 1