Reputation: 3630
The feed in question is: http://api.inoads.com/snowstorm/feed.xml
Here is the PHP code I am using for the generation:
<?php
$database = 'xxxx';
$dbconnect = mysql_pconnect('xxxx', 'xxxx', 'xxxx');
mysql_select_db($database, $dbconnect);
$query = "SELECT * FROM the_queue WHERE id LIKE '%' ORDER BY id DESC LIMIT 25";
$result = mysql_query($query, $dbconnect);
while ($line = mysql_fetch_assoc($result))
{
$return[] = $line;
}
$now = date("D, d M Y H:i:s T");
$output = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<rss version=\"2.0\">
<channel>
<title>The Queue</title>
<link>http://readapp.net</link>
<description>A curated reading list.</description>
<language>en-us</language>
<pubDate>$now</pubDate>
<lastBuildDate>$now</lastBuildDate>
";
foreach ($return as $line)
{
$output .= "<item><title>".htmlspecialchars($line['title'])."</title>
<description>".htmlspecialchars($line['description'])."</description>
<link>".htmlspecialchars($line['link'])."</link>
<pubDate>".htmlspecialchars($line['pubDate'])."</pubDate>
</item>";
}
$output .= "</channel></rss>";
$fh = fopen('feed.xml', 'w');
fwrite($fh, $output);
?>
What might be causing the error?
Here's a link from the feed validator: http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fapi.inoads.com%2Fsnowstorm%2Ffeed.xml
Upvotes: 1
Views: 4676
Reputation: 33329
You said the XML file is UTF-8, but when I download it and open it in my text editor it auto-detects the windows latin1
encoding, and the quotes display perfectly.
If I force my text editor to use UTF-8, it shows an error message because there are illegal characters for the UTF-8 encoding.
Therefore, your data is not UTF-8, it is latin1. You need to find out exactly where that's happening. It could be any one, or several of:
is the HTML page where the content is typed in by the user set to UTF-8?
If not, the browser will be sending latin1 quotes. To fix this, the first tag in your <head>
needs to be:
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
...
</head>
is every browser correctly respecting your UTF-8 setting in that page's HTML?
If you specify UTF-8 and the page contains characters illegal in that encoding, some browsers might decide to use a different encoding despite the <meta>
tag. How to check this is different in every browser.
is the MySQL connection when inserting into the database set to use UTF-8?
You need to be using UTF-8 here, or else MySQL may try to convert the encoding for you, often corrupting them. Set the encoding with:
$database = 'xxxx';
$dbconnect = mysql_pconnect('xxxx', 'xxxx', 'xxxx');
mysql_select_db($database, $dbconnect);
mysql_query('SET NAMES utf8', $dbconnect);
is the MySQL table (and individual column) set to use UTF-8?
Again, to avoid MySQL doing it's own buggy conversion, you need to make sure it's using UTF-8 for the table and also the individual comment. Do a structure dump of the database and check for:
CREATE TABLE `the_queue` (
...
) ... DEFAULT CHARSET=utf8;
And also make sure there isn't something like this on any of the columns:
`description` varchar(255) CHARACTER SET latin1,
is the MySQL connection when reading the database set to use UTF-8?
Your read connection also needs to be utf8
. So double check that.
are you doing anything in the PHP that cannot handle UTF-8?
PHP has some functions which cannot be used on utf-8 strings, as it will corrupt them. One of those functions is htmlentities()
so make sure you always use htmlspecialchars()
. The easiest way to test this is to start commenting out big chunks of your code to see where the encoding is breaking.
Upvotes: 3
Reputation: 77400
The point of htmlentities
is to replace all characters that have define HTML character entities with those entities. If you really don't want any character entities (as your desired result suggests), don't use htmlentities
.
By default, htmlentities
uses the latin-1 charset, so it chokes on the smart quotes (indeed, all multibyte characters), which is where you see the question marks. One fix is to use htmlspecialchars
to convert a much more limited set of characters (&, <, >, ' and "). This will still convert the double quotes because, well, that's the point of htmlspecialchars
, unless you specify the ENT_NOQUOTES
as the second argument. Another fix is to specify the character set as the third argument (this isn't exclusive of using htmlspecialchars
).
The fourth argument to either specifies whether or not to encode already encoded characters. Whether or not do double-encode depends on the source data.
$line['description'] = '"Dave, stop. Stop, will you? Stop, Dave. Will you stop, Dave?” ... “Dave, my mind is going,” HAL says, forlornly. “I can feel it. I can feel it.”';
echo "<description>" . htmlspecialchars($line['description'], ENT_NOQUOTES, 'UTF-8', false) . "</description>";
See also:
Upvotes: 1
Reputation: 7763
There is one problem here:
$output = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
...
There is a string containing "?>". This is the finalization marker for php. It will give you an error.
You can avoid these problems this way:
$output = "<?xml version=\"1.0\" encoding=\"UTF-8\"?".">
...
Upvotes: 1
Reputation: 365
Another error that you have it´s the format of the date. The date must be in format RFC-822, it must be in a format like this "Wed, 02 Oct 2002 08:00:00 EST", not "July/August 2008".
Upvotes: 0
Reputation: 827
Problem is that you are holding this string with quotes in database (as I assume). If it is true, PHP is removing quotes (which is proper), because of not causing bugs (SQL injection ex). So you have to remove quotes in DB and while generating XML file just add them. It is the simplest in my opinion. And try avoid double quotes ". You should use single ones '. In double PHP parser additionally checks what is in. So try to remove qoutes from DB and add them while generating XML. Should help.
Upvotes: 0