Jonathan.
Jonathan.

Reputation: 55604

Parsing XML document in PHP

I have an XML file which I'm parsing with SimpleXML in php. The first line is

<?xml version="1.0" encoding="iso-8859-1"?>

The result of the parse is stored in $xml, if I do:

echo $xml->asXML();

then the entire file displays perfectly.

But if I dig into the structure in anyway, I get Â's everwhere, eg:

echo $xml->Chapter->asXML();

Inside some of the XML elements there is MathML (<math>), this is where the Â's occur. For example the character is replaced by a Â.

How can I parse the XML file but not lose the MathML characters?

Upvotes: 2

Views: 784

Answers (3)

Yzmir Ramirez
Yzmir Ramirez

Reputation: 1281

The problem is not your encoding, the problem is that not all browsers support MathML that your script is echoing to the browser.

http://en.wikipedia.org/wiki/MathML#Web_browsers

Tested this in the following browser:

  • Safari 5.1.2 - failed
  • Chrome 17.0.9x - partial
  • Firefox 3.6.28 - works

Upvotes: 0

salathe
salathe

Reputation: 51970

∈ is not a character that can be represented in ISO 8859-1, change your XML to say that it is encoded with UTF-8.

To give an example demonstrating the problem.

$x = simplexml_load_string('<?xml version="1.0" encoding="iso-8859-1"?>
<example><math>∈</math></example>');
echo $x->math, PHP_EOL;

$x = simplexml_load_string('<?xml version="1.0" encoding="utf-8"?>
<example><math>∈</math></example>');
echo $x->math, PHP_EOL;

Outputs (as UTF-8) the following.

â
∈

SimpleXML will try to convert to UTF-8 when the encoding is set to something different. It is always a good idea not to give it that work to do when the input is already UTF-8 encoded and the encoding declaration is incorrect.


Also be sure that PHP itself is outputting UTF-8, and telling the browser that this is the case!

You can do this by setting the default_charset INI option (in your php.ini or with ini_set()), or sending the correct Content-Type header (header('Content-Type: text/html; charset=utf-8')).

Upvotes: 2

Kyborek
Kyborek

Reputation: 1511

You may need to convert the input into other encoding before parsing it with SimpleXML.

  1. Read file contents as text
  2. Convert to different encoding
  3. Parse with SimpleXML and do whatever you want
  4. If needed, convert the output to original encoding

For this, function iconv() is very useful: http://php.net/manual/en/function.iconv.php

Upvotes: -1

Related Questions