Reputation: 1141
I generate a lot of posts in Wordpress from an XML file. The worry: accented characters.
The header of the stream is:
<? Xml version = "1.0" encoding = "ISO-8859-15"?>
Here is the complete flux : http://flux.netaffiliation.com/rsscp.php?maff=177053821BA2E13E910D54
My site is in utf8.
So I use the function utf8_encode ... but that does not solve the problem, the accents are always misunderstood.
Does anyone have an idea?
EDIT 04-10-2011 18:02 (french hour) :
Here is the complete flux : http://flux.netaffiliation.com/rsscp.php?maff=177053821BA2E13E910D54
Here is my code :
/**
* parse an rss flux from netaffiliation and convert each item to posts
* @var $flux = external link
* @return bool
*/
private function parseFluxNetAffiliation($flux)
{
$content = file_get_contents($flux);
$content = iconv("iso-8859-15", "utf-8", $content);
$xml = new DOMDocument;
$xml->loadXML($content);
//get the first link : http://www.netaffiliation.com
$link = $xml->getElementsByTagName('link')->item(0);
//echo $link->textContent;
//we get all items and create a multidimentionnal array
$items = $xml->getElementsByTagName('item');
$offers = array();
//we walk items
foreach($items as $item)
{
$childs = $item->childNodes;
//we walk childs
foreach($childs as $child)
{
$offers[$child->nodeName][] = $child->nodeValue;
}
}
unset($offers['#text']);
//we create one article foreach offer
$nbrPosts = count($offers['title']);
if($nbrPosts <= 0)
{
echo self::getFeedback("Le flux ne continent aucune offre",'error');
return false;
}
$i = 0;
while($i < $nbrPosts)
{
// Create post object
$description = '<p>'.$offers['description'][$i].'</p><p><a href="'.$offers['link'][$i].'" target="_blank">'.$offers['link'][$i].'</a></p>';
$my_post = array(
'post_title' => $offers['title'][$i],
'post_content' => $description,
'post_status' => 'publish',
'post_author' => 1,
'post_category' => array(self::getCatAffiliation())
);
// Insert the post into the database
if(!wp_insert_post($my_post));;
$i++;
}
echo self::getFeedback("Le flux a généré {$nbrPosts} article(s) depuis le flux NetAffiliation dans la catégorie affiliation",'updated');
return false;
}
All the posts are generated but... the accented chars are ugly. You can see the result here: http://monsieur-mode.com/test/
Upvotes: 2
Views: 1455
Reputation: 1141
mb_convert_encoding()
saves my life.
Here is my solution :
$content = preg_replace('/ encoding="ISO-8859-15"/is','',$content);
$content = mb_convert_encoding($content,"UTF-8");
Upvotes: 0
Reputation: 11403
There are plenty difficulties which you have to master when swapping between different encodings. Also, encodings which use more than one byte to encode characters (so-called multibyte-encodings) like UTF-8, which is used by WordPress, deserve special attention in PHP.
Content-Type
header.ISO-8859-15
, so you'll need to convert it to UTF-8
using iconv()
.UTF-8
. Functions such as htmlentities()
will produce strange characters. For many of these functions, there are multibyte-alternatives, which are prefixed with mb_
. If your encoding is UTF-8
, check your files for such functions and replace them if necessary.For more information about these topics, see Wikipedia about variable-width encodings, and the page in the PHP-Manual.
Upvotes: 2
Reputation: 449813
If your incoming XML data is ISO-8859-15, use iconv()
to convert it:
$stream = file_get_contents("stream.xml");
$stream = iconv("iso-8859-15", "utf-8", $stream);
Upvotes: 0
Reputation: 12244
By default, most application work with UTF-8 data and output UTF-8 content. Wordpress should definitely not be apart and surely works on a UTF-8 basis.
I would simply not convert at all any information when printing, but instead change your header to UTF-8 instead of ISO-8859-15.
Upvotes: 0