Reputation: 951
So far, my code is getting all classes 'forumRow' using a xPath query. How would I get the href-attribute of the a-element which exists once in every 'forumRow' class?
I'm kinda stuck at the point where I can run a query starting from the result of the first query.
My current code
$this -> boards = array();
$html = @file_get_contents('http://www.roblox.com/Forum/Default.aspx');
libxml_use_internal_errors(true);
$page = new DOMDocument();
$page -> preserveWhiteSpace = false;
$page -> loadHTML($html);
$xpath = new DomXPath($page);
$board_array = $xpath -> query('//*[@class="forumRow"]');
foreach($board_array as $board)
{
$childNodes = $board -> childNodes;
$boardName = $childNodes -> item(0) -> nodeValue;
if (strlen($boardName) > 0)
{
$boardDesc = $childNodes -> item(1) -> nodeValue;
array_push($this -> boards, array($boardName, $boardDesc));
}
}
$Cache -> saveData(json_encode($this -> boards));
Upvotes: 2
Views: 4901
Reputation: 38672
There is a return
right in the middle of your function, so the array is never filled, nor saveData(...)
gets called. Just remove this line and your code seems to work. ;)
$childNodes = $board -> childNodes;
return; // <-- remove this line
$boardName = $childNodes -> item(0) -> nodeValue;
Upvotes: 0
Reputation: 85518
Sad to say, I could not get your code to work (regarding extract of forumRow <td>
's) - so I made this up instead :
$html = @file_get_contents('http://www.roblox.com/Forum/Default.aspx');
libxml_use_internal_errors(true);
$page = new DOMDocument();
$page->preserveWhiteSpace = false;
$page->loadHTML($html);
$xpath = new DomXPath($page);
foreach($xpath->query('//td[@class="forumRow"]') as $element){
$links=$element->getElementsByTagName('a');
foreach($links as $a) {
echo $a->getAttribute('href').'<br>';
}
}
produces
/Forum/Search/default.aspx
/Forum/ShowForum.aspx?ForumID=46
/Forum/ShowForum.aspx?ForumID=14
/Forum/ShowForum.aspx?ForumID=44
/Forum/ShowForum.aspx?ForumID=43
/Forum/ShowForum.aspx?ForumID=45
/Forum/ShowForum.aspx?ForumID=21
/Forum/ShowForum.aspx?ForumID=13
...
a very long list
All the hrefs from <td class="forumRow">..<a href= ... ></a>..</td>
Upvotes: 4