Veltar
Veltar

Reputation: 741

php-script outputs xml file but breaks on special characters

I have created this php script, to create an xml-file from my database:

<?php
header("Content-type: text/xml");
header("Expires: Mon, 26 Jul 1990 05:00:00 GMT");
header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header("Cache-Control: no-store, no-cache, must-revalidate");
header("Cache-Control: post-check=0, pre-check=0", false);
header("Pragma: no-cache");

$host = "localhost";
$user = "root";
$pass = "root";
$database = "flexapp";

$charToReplace = array("é", "è", "ê");
$charReplacements = array("e", "e", "e");

$linkID = mysql_connect($host, $user, $pass) or die("Could not connect to host.");
mysql_select_db($database, $linkID) or die("Could not find database.");

$query = "SELECT * FROM artists";
$artist_result = mysql_query($query, $linkID) or die("Data not found.");

$xml_output = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n";
$xml_output .= "<artists>\n";

for($x = 0 ; $x < mysql_num_rows($artist_result) ; $x++){
    $itemrow = mysql_fetch_assoc($artist_result);
    $xml_output .= "\t<artist>\n";
    $xml_output .= "\t\t<id>" . $itemrow['pk_artist_id'] . "</id>\n";
    $itemrow['artist_name'] = str_replace($charToReplace, $charReplacements,         $itemrow['artist_name']);
    $xml_output .= "\t\t<name>" . $itemrow['artist_name'] . "</name>\n";
    $xml_output .= "\t\t<picture>" . $itemrow['artist_pic'] . "</picture>\n";
    $xml_output .= "\t\t<twitter>" . $itemrow['artist_twitter'] . "</twitter>\n";
    $xml_output .= "\t</artist>\n";
}
$xml_output .= "</artists>\n";
echo $xml_output;
?>

I try to replace characters like é en ê to e.

But it gives this error in the browser:

This page contains the following errors:

error on line 23 at column 9: Encoding error

This is the xml that is outputted:

<?xml version="1.0" encoding="UTF-8"?>
<artists>
<artist>
    <id>1155</id>
    <name>Have Heart</name>
    <picture>http://userserve-ak.last.fm/serve/126/29086375.jpg</picture>
    <twitter></twitter>
</artist>
<artist>
    <id>1156</id>
    <name>Dead Swans</name>
    <picture>http://userserve-ak.last.fm/serve/126/4781939.jpg</picture>
    <twitter></twitter>
</artist>
<artist>
    <id>1157</id>
    <name>Nirvana</name>
    <picture>http://userserve-ak.last.fm/serve/126/3991355.jpg</picture>
    <twitter></twitter>
</artist>
<artist>
    <id>1158</id>
    <name>Touchter>
</artist>

But the last one (the name in the database is Touché Amoré) should be like all the rest, but somehow the strings aren't being replaced.

Upvotes: 2

Views: 3858

Answers (4)

CD001
CD001

Reputation: 8472

If your database is in utf-8 and your XML file is in utf-8; you shouldn't need to do any conversion for "foreign" characters like é.

First off, ensure your database tables are stored in utf-8 and then fix your XML file to utf-8...

header("Cache-Control: post-check=0, pre-check=0", false);
header("Pragma: no-cache");
header("Content-type:text/xml;charset=utf-8"); //  <- add this line

Next - make sure that your MySQL connector is retrieving the utf-8 data from the database in utf-8 (it doesn't by default).

$linkID = mysql_connect($host, $user, $pass) or die("Could not connect to host.");
mysql_select_db($database, $linkID) or die("Could not find database.");
mysql_set_charset("UTF8"); //  <-- add this line; NOTE MySQL doesn't use the hyphen in the utf-8 string

In theory now, all your é, ê and ñ (and so on) characters should appear properly inside the XML document without any conversions or CDATA parsing required.

The only characters you'll need to convert are, IIRC, the same as those converted by htmlspecialchars namely <, >, & and (optionally) quotes and apostrophes.

Upvotes: 0

dchrastil
dchrastil

Reputation: 592

This may be an issue with encoding the data before printing. Try using utf8_encode() and htmlentities()

function xmlencode($data) {
    $data = utf8_encode($data);
    $data = htmlentities($data);
    return $data;
}

and then

...
$xml_output .= "\t\t<picture>" . xmlencode($itemrow['artist_pic']) . "</picture>\n";
...

Upvotes: 3

Rifat
Rifat

Reputation: 7738

Why don't you just wrap name variable with CDATA like

header('Content-Type: text/xml, charset=utf-8');

<name><![CDATA[Some very wired name]]></name>

And, If I were you then I will use SimpleXML to generate the XML.

Upvotes: 0

Dennis Jamin
Dennis Jamin

Reputation: 386

I am guessing the characters are saved as entities, which is why it will not replace and will 'destroy' your XML output.

Your scripts works like a charm over here and outputs as expected. Will you check your database via phpmyadmin if the saved value contains é instead of é? In that case you should decode the entities first to the characters they represent: http://php.net/manual/en/function.html-entity-decode.php

Upvotes: 0

Related Questions