HELP
HELP

Reputation: 14575

PHP & RSS Feeds & Special Characters validation Problem

I keep getting the following validation warning below. And I was wondering that some of my articles deal with special characters and was wondering how should I go about rendering or not rendering special characters in my RSS feeds? Should I use htmlentites or not? If so how?

In addition, interoperability with the widest range of feed readers could be improved by implementing the following recommendations. line 22, column 35: title should not contain HTML: &

PHP code.

<title>' . htmlentities(strip_tags($title), ENT_QUOTES, "UTF-8") . '</title>

Upvotes: 3

Views: 4476

Answers (3)

Jay Brunet
Jay Brunet

Reputation: 3750

/* feedvalidator.org (Feedburner recommends this site to validate your feeds) says: "For the widest interop, the RSS Profile recommends the use of the hexadecimal character reference "&" to represent "&" and "<" to represent "<". */

        // find title problems
        $find[] = '<';
        $find[] = '\x92';
        $find[] = '\x84';

        // find content problems
        $find_c[] = '\x92';
        $find_c[] = '\x84';
        $find_c[] = '&nbsp;';

        // replace title
        $replace[] = '&#x3C;';
        $replace[] = '&#39;';
        $replace[] = '&#34;';

        // replace content
        $replace_c[] = '&#39;';
        $replace_c[] = '&#34;';
        $replace_c[] = ' ';

        // We don't want to re-replace "&" characters.  
        // So do this first because of PHP "feature" https://bugs.php.net/bug.php?id=33773
        $title = str_replace('&', '&#x26;', $title); 
        $title = str_replace($find, $replace, $title);
        $post_content = str_replace($find_c, $replace_c, $row[3]);

        // http://productforums.google.com/forum/#!topic/merchant-center/nIVyFrJsjpk
        $link = str_replace('&', '&amp;', $link);

Of course I'm doing some pre-processing before $title, $post_content and $link are added to my database. But this should help solve some common problems to get a valid RSS feed.

Update: Fixed the &#x26;#x26;#x26; "recursion" problem, see https://bugs.php.net/bug.php?id=33773

Upvotes: 1

dqhendricks
dqhendricks

Reputation: 19251

Take out the htmlentities(). It's only for HTML files.

Upvotes: 0

RobertPitt
RobertPitt

Reputation: 57258

You should use CDATA To escape characters in your XML feeds, this allows you to use your raw data without disrupting the XML layout.

Try this:

<title><![CDATA[ YOUR RAW CONTENT]]></title>

Note: do not use htmlentites and strip_tags as this will escape them for the browser, and any other reader should read them correctly.

Qoute from w3schools:

The term CDATA is used about text data that should not be parsed by the XML parser. Characters like "<" and "&" are illegal in XML elements. "<" will generate an error because the parser interprets it as the start of a new element. "&" will generate an error because the parser interprets it as the start of an character entity. Some text, like JavaScript code, contains a lot of "<" or "&" characters. To avoid errors script code can be defined as CDATA. Everything inside a CDATA section is ignored by the parser. A CDATA section starts with "":

http://www.w3schools.com/xml/xml_cdata.asp

Upvotes: 3

Related Questions