Reputation: 2001
I'm trying to create and RSS feed of League of Legends news, since they don't have one... I'm trying to parse the HTML and find all elements containing a certain class attribute.
Here is what I have, but it's not finding anything.
<?php
$page = file_get_contents("http://na.leagueoflegends.com/en/news/");
$dom = new DomDocument();
$dom->load($page);
$finder = new DomXPath($dom);
$classname="node-article";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
echo "<pre>" . print_r($nodes, true) . "</pre>";
?>
Edit: Working code...
<?php
$page = file_get_contents("http://na.leagueoflegends.com/en/news/");
$dom = new DomDocument();
@$dom->loadHTML($page);
$finder = new DomXPath($dom);
$classname = "node-article";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
$articles = array();
foreach ($nodes as $node) {
$h4 = $node->getElementsByTagName('h4')->item(0);
$articles[] = array(
'title' => htmlentities($h4->firstChild->nodeValue),
'content' => htmlentities($h4->nextSibling->nodeValue),
'link' => 'http://na.leagueoflegends.com/en/news' . $h4->firstChild->getAttribute('href')
);
}
echo "<pre>" . print_r($articles, true) . "</pre>";
?>
Upvotes: 0
Views: 713
Reputation: 10132
Actually you need loadHTML
(Which reads string containing source) instead of load
(Which basically accepts path to the document). Also you are using file_get_contents
which reads entire file into a string. So you already have a string containing HTML Source.
Try this:
$page = file_get_contents("http://na.leagueoflegends.com/en/news/");
$dom = new DomDocument();
$dom->loadHTML($page);
$finder = new DomXPath($dom);
$classname = "node-article";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
echo "<pre>" . print_r($nodes, true) . "</pre>";
// get title and content of article
$arr = array();
foreach ($nodes as $node) {
$h4 = $node->getElementsByTagName('h4')->item(0);
$arr[] = array(
'title' => $h4->nodeValue,
'content' => $h4->nextSibling->nodeValue,
);
}
var_dump($arr); // your title & body content
Upvotes: 1