Reputation: 3952
Hi I have a website's home page that I am reading in using Curl and I need to grab the number of pages that the site has.
The information is in a div:-
<div class="pager">
<span class="page-numbers current">1</span>
<a href="/users?page=2" title="go to page 2"><span class="page-numbers">2</span></a>
<a href="/users?page=3" title="go to page 3"><span class="page-numbers">3</span></a>
<a href="/users?page=4" title="go to page 4"><span class="page-numbers">4</span></a>
<a href="/users?page=5" title="go to page 5"><span class="page-numbers">5</span></a>
<span class="page-numbers dots">…</span>
<a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a>
<a href="/users?page=2" title="go to page 2"><span class="page-numbers next"> next</span></a>
</div>
The value I need is 15 but this could be any number depending on the site but will always be in the same position.
How could I read this value easily and assign it to a variable in PHP.
Thanks
Jonathan
Upvotes: 0
Views: 947
Reputation: 3952
Just wanted to say a huge thank you to Volkerk for helping out - it worked really well. I had to make a few slight changes and ended up with this:-
function getusers($userurl)
{
$sSourceData = file_get_contents($userurl);
$doc = new DOMDocument();
@$doc->loadHTML($sSourceData);
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query('//div[@class="pager"]/a[position()=last()-1]/span');
if ( 0 < $nodelist->length ) {
$lastpage = $nodelist->item(0)->nodeValue;
$users = $lastpage * 35;
$userurl = $userurl.'?page='.$lastpage;
$sSourceData = file_get_contents($userurl);
$doc = new DOMDocument();
@$doc->loadHTML($sSourceData);
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query('//div[@class="user-details"]');
$users = $users + $nodelist->length;
echo 'there are ', $users , ' users';
}
else {
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query('//div[@class="user-details"]');
echo 'there are ', $nodelist->length, ' users';
}
}
Upvotes: 0
Reputation: 96159
You can use PHP's DOM module for that. Read the page with DOMDocument::loadhtmlfile(), then create a DOMXPath object and query all span elements within the document having the class="page-numbers" attribute.
(edit: oops, that's not what you're looking for, see second code snippet)
$html = '<html><head><title>:::</title></head><body>
<div class="pager">
<span class="page-numbers current">1</span>
<a href="/users?page=2" title="go to page 2"><span class="page-numbers">2</span></a>
<a href="/users?page=3" title="go to page 3"><span class="page-numbers">3</span></a>
<a href="/users?page=4" title="go to page 4"><span class="page-numbers">4</span></a>
<a href="/users?page=5" title="go to page 5"><span class="page-numbers">5</span></a>
<span class="page-numbers dots">…</span>
<a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a>
<a href="/users?page=2" title="go to page 2"><span class="page-numbers next"> next</span></a>
</div>
</body></html>';
$doc = new DOMDocument;
// since the content "is already here" we use loadhtml(content)
// instead of loadhtmlfile(url)
$doc->loadhtml($html);
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query('//span[@class="page-numbers"]');
echo 'there are ', $nodelist->length, ' span elements having class="page-numbers"';
edit: does this
<a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a>
(the second last a
element) always point to the last page, i.e. does this link contain the value you're looking for?
Then you can use a XPath expression that selects the second but last a
element and from there its child span
element.
//div[@class="pager"] <- select each <div> where the attribute class equals "pager"
//div[@class="pager"]/a <- select each <a> that is a direct child of the pager div
//div[@class="pager"]/a[position()=last()-1] <- select the <a> that is second but last
//div[@class="pager"]/a[position()=last()-1]/span <- select the direct child <span> of that second but last <a> element in the pager <div>
( you might want to fetch a good XPath tutorial ;-) )
$doc->loadhtml($html);
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query('//div[@class="pager"]/a[position()=last()-1]/span');
if ( 0 < $nodelist->length ) {
echo $nodelist->item(0)->nodeValue;
}
else {
echo 'not found';
}
Upvotes: 2
Reputation: 2457
perhaps
$nodes = $dom->getElementsByTagName("span");
$maxPageNum = 0;
foreach($nodes as $node)
{
if( $node.class == "page-numbers" && $node.value > $maxPageNum )
{
$maxPageNum = $node.value;
}
}
I don't know PHP, so maybe it's not that easy to access the class/inner text of a dom node, but there must be some way to get that info and the pseudocode here should work.
Upvotes: 0
Reputation: 746
This is something you would might want to use a xpath for - which requires loading the page as a dom document object:
$domDoc = new DOMDocument();
$domDoc->loadHTMLFile("http://path/to/yourfile.html");
$xp = new DOMXPath($domDoc);
$nodes = $xp->query("//xpath/to/relevant/node");
$value = $nodes[0];
I haven't written a good xpath in a while, so you should do some reading to figure out that part, but it shouldn't be too difficult.
Upvotes: 0
Reputation: 28713
You can parse it with regular expression. First find all occurense of <span class="page-numbers">
, then select the last one:
// div html code should be in $div_html
preg_match_all('#<span class="page-numbers">(\d+)#', $div_html, $page_numbers);
print_r(end($page_numbers[1])); // prints 15
Upvotes: 0
Reputation: 27856
There is no direct function or easy way to do that. You need to build or use an existing HTML parser to do that.
Upvotes: 0