AnJ
AnJ

Reputation: 125

Parsing RSS with PHP

I'm trying to parse RSS: http://www.mlssoccer.com/rss/en.xml .

$feed = new DOMDocument();
$feed->load($url)
$items = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('item');

foreach($items as $key => $item) 
{
    $title = $item->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
    $pubDate = $item->getElementsByTagName('pubDate')->item(0)->firstChild->nodeValue;
    $description = $item->getElementsByTagName('description')->item(0)->firstChild->nodeValue;
    // do some stuff
}

The thing is: I'm getting "$title" and "$pubDate" without a problem, but for some reason "$description" is always empty, there's nothing in it. What could be the reason for such behaviour and how to fix it?

Upvotes: 0

Views: 194

Answers (2)

ThW
ThW

Reputation: 19482

Here can be whitespaces between the opening <description> tag and the opening <![CDATA[. This is a text node.

So if you access the firstChild of description, you might fetch that whitespace text node.

In a generic way you can set the DOMdocument to ignore whitespace nodes:

$feed = new DOMDocument();
$feed->preserveWhiteSpace  = FALSE;
$feed->load($url);

Additionally you should check out XPath, it makes reading a DOM much easier:

$xpath = new DOMXpath($feed);

foreach ($xpath->evaluate('//channel/item') as $item) {
    $title = $xpath->evaluate('string(title)', $item);
    $pubDate = $xpath->evaluate('string(pubDate)', $item);
    $description = $xpath->evaluate('string(description)', $item);
    // do some stuff
    var_dump([$title, $pubData, $description]);
}

Upvotes: 1

Isaac
Isaac

Reputation: 983

The problem was with CDATA you need to use textContent instead of nodeValue to retreive value beetween

<?php

$feed = new DOMDocument();
$feed->load('http://www.mlssoccer.com/rss/en.xml');
$items = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('item');

foreach($items as $key => $item) 
{
    $title = $item->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
    $pubDate = $item->getElementsByTagName('pubDate')->item(0)->firstChild->nodeValue;
    $description = $item->getElementsByTagName('description')->item(0)->textContent; // textContent

}

Upvotes: 3

Related Questions