Rob
Rob

Reputation: 6380

Outputting XML data via PHP giving fatal error's

I've been given data from a previous version of a website (it was a custom CMS) and am looking to get it into a state that I can import it into my Wordpress site.

This is what I'm working on - http://www.teamworksdesign.com/clients/ciw/datatest/index.php. If you scroll down to row 187 the data starts to fail (there should be a red message) with the following error message:

Fatal error: Uncaught exception 'Exception' with message 'String could not be parsed as XML' in /home/teamwork/public_html/clients/ciw/datatest/index.php:132 Stack trace: #0 /home/teamwork/public_html/clients/ciw/datatest/index.php(132): SimpleXMLElement->__construct('

Can anyone see what the problem is and how to fix it?

This is how I'm outputting the date:

<!DOCTYPE html>
<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>

<?php


ini_set('memory_limit','1024M');

ini_set('max_execution_time', 500); //300 seconds = 5 minutes

echo "<br />memory_limit: " .  ini_get('memory_limit') . "<br /><br />";
echo "<br />max_execution_time: " .  ini_get('max_execution_time') . "<br /><br />";

libxml_use_internal_errors(true); 

$z = new XMLReader;
$z->open('dbo_Content.xml');

$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;

// move to the first <product /> node
while ($z->read() && $z->name !== 'dbo_Content');

$c = 0;

// now that we're at the right depth, hop to the next <product/> until the end of the tree
while ($z->name === 'dbo_Content')
{

    if($c < 201) {

        // either one should work
        $node = simplexml_import_dom($doc->importNode($z->expand(), true));

        if($node->ClassId == 'policydocument') {

            $c++;

            echo "<h1>Row: $c</h1>";

            echo "<pre>";

            echo htmlentities($node->XML) . "<br /><br /><br /><b>*******</b><br /><br /><br />";

            echo "</pre>";

            try{ 
                $xmlObject = new SimpleXMLElement($node->XML);

                foreach ($xmlObject->fields[0]->field as $field) {

                    switch((string) $field['name']) {
                        case 'parentId':
                            echo "<b>PARENT ID: </b> " . $field->value . "<br />";
                            break;
                        case 'title':
                            echo "<b>TITLE: </b> " . $field->value . "<br />";
                            break;
                        case 'summary':
                            echo "<b>SUMMARY: </b> " . $field->value . "<br />";
                            break;
                        case 'body':
                            echo "<b>BODY:</b> " . $field->value . "<br />";
                            break;
                        case 'published':
                             echo "<b>PUBLISHED:</b> " . $field->value . "<br />";
                             break;
                    }
                }

                echo '<br /><h2 style="color:green;">Success on node: '.$node->ContentId.'</h2><hr /><br />';           

            } catch (Exception $e){ 
                echo '<h2 style="color:red;">Failed on node: '.$node->ContentId.'</h2>'; 
            }

        }

        // go to next <product />
        $z->next('dbo_Content');

    }


} ?>

</body>
</html>

Upvotes: 0

Views: 266

Answers (1)

Spudley
Spudley

Reputation: 168715

The error message you're getting "String could not be parsed as XML" means that the XML parser found something in the input data that was not valid XML.

You haven't shown us the data, so I can't tell you exactly what is invalid, but something in there is failing to meet the strict rules for XML parsing. There are any number of possible reasons for this.

If I had to stick my neck out on the line and guess, I'd say the most common reason cause of bad XML in the middle of a file that is otherwise okay would be an unescaped & when it should be the &amp; entity code.

Anyone creating their XML using a proper XML writer shouldn't have this issue, but I've come across plenty of cases where people don't bother using an XML writer and just output raw XML as text and have forgotten to escape the entities, which means that that the data is fine until you come to a company name with an & in it.

If it's as simple as that, and it's a one-off import, you may be able to fix the file manually in a text editor.

However that's just a guess. You'll need to actually examine the XML file for yourself to see the problem. If you can't see the problem visually, I'd suggest using a GUI XML tool to analyse the file.

Hope that helps.

[EDIT]

Okay, I just took a better look at the data in the link you gave, and on thing sticks out like a sore thumb....

encoding="utf-16"

I note that all the data that has worked was using UTF-8, and all the data that has failed is using UTF-16.

PHP is generally fine with UTF-8, but it won't cope very well at all with UTF-16. So it's fairly clear that this is your problem.

And, to be honest, there's really no need to ever use UTF-16, so the solution here is to switch to UTF-8 encoding for everything.

How easy that is for you to do, I can't say, but worst case I'm sure you could find a batch convertion tool.

Hope that helps.

Upvotes: 1

Related Questions