Reputation: 14575
I keep getting the following validation warning below. And I was wondering that some of my articles deal with special characters and was wondering how should I go about rendering or not rendering special characters in my RSS feeds? Should I use htmlentites or not? If so how?
In addition, interoperability with the widest range of feed readers could be improved by implementing the following recommendations. line 22, column 35: title should not contain HTML:
&
PHP code.
<title>' . htmlentities(strip_tags($title), ENT_QUOTES, "UTF-8") . '</title>
Upvotes: 3
Views: 4476
Reputation: 3750
/* feedvalidator.org (Feedburner recommends this site to validate your feeds) says: "For the widest interop, the RSS Profile recommends the use of the hexadecimal character reference "&" to represent "&" and "<" to represent "<". */
// find title problems
$find[] = '<';
$find[] = '\x92';
$find[] = '\x84';
// find content problems
$find_c[] = '\x92';
$find_c[] = '\x84';
$find_c[] = ' ';
// replace title
$replace[] = '<';
$replace[] = ''';
$replace[] = '"';
// replace content
$replace_c[] = ''';
$replace_c[] = '"';
$replace_c[] = ' ';
// We don't want to re-replace "&" characters.
// So do this first because of PHP "feature" https://bugs.php.net/bug.php?id=33773
$title = str_replace('&', '&', $title);
$title = str_replace($find, $replace, $title);
$post_content = str_replace($find_c, $replace_c, $row[3]);
// http://productforums.google.com/forum/#!topic/merchant-center/nIVyFrJsjpk
$link = str_replace('&', '&', $link);
Of course I'm doing some pre-processing before $title, $post_content and $link are added to my database. But this should help solve some common problems to get a valid RSS feed.
Update: Fixed the &#x26;#x26; "recursion" problem, see https://bugs.php.net/bug.php?id=33773
Upvotes: 1
Reputation: 57258
You should use CDATA To escape characters in your XML feeds, this allows you to use your raw data without disrupting the XML layout.
Try this:
<title><![CDATA[ YOUR RAW CONTENT]]></title>
Note: do not use htmlentites and strip_tags as this will escape them for the browser, and any other reader should read them correctly.
Qoute from w3schools:
The term CDATA is used about text data that should not be parsed by the XML parser. Characters like
"<"
and"&"
are illegal in XML elements."<"
will generate an error because the parser interprets it as the start of a new element."&"
will generate an error because the parser interprets it as the start of an character entity. Some text, like JavaScript code, contains a lot of"<"
or"&"
characters. To avoid errors script code can be defined as CDATA. Everything inside a CDATA section is ignored by the parser. A CDATA section starts with "":
http://www.w3schools.com/xml/xml_cdata.asp
Upvotes: 3