Reputation: 11989
I'm trying to parse an XML file and one of the fields looks like the following:
<link>http://foo.com/this-platform/scripts/click.php?var_a=a&var_b=b&varc=http%3A%2F%2Fwww.foo.com%2Fthis-section-here%2Fperf%2F229408%3Fvalue%3D0222%26some_variable%3Dmeee</link>
This seems to break the parser. i think it might be something to do with the & in the link?
My code is quite simple:
<?
$xml = simplexml_load_file("files/this.xml");
echo $xml->getName() . "<br />";
foreach($xml->children() as $child) {
echo $child->getName() . ": " . $child . "<br />";
}
?>
any ideas how i can resolve this?
Upvotes: 3
Views: 4995
Reputation: 13673
If your XML already has some escaping, this way it will be preserved and unescaped ampersands will be fixed:
$brokenXmlText = file_get_contents("files/this.xml");
$fixed = preg_replace('/&(?!lt;|gt;|quot;|apos;|amp;|#)/', '&', $brokenXmlText);
$xml = simplexml_load_string($fixed);
Upvotes: 1
Reputation: 950
I think this will help you http://www.php.net/manual/en/simplexml.examples-errors.php#96218
Upvotes: 0
Reputation: 11989
The comment by mjv resolved it:
Alternatively to using &, you may consider putting the urls and other XML-unfriendly content in , i.e. a Character Data block
Upvotes: 0
Reputation: 400912
Your XML feed is not valid XML : the &
should be escaped as &
This means you cannot use an XML parser on it :-(
A possible "solution" (feels wrong, but should work) would be to replace '&
' that are not part of an entity by '&
', to get a valid XML string before loading it with an XML parser.
In your case, considering this :
$str = <<<STR
<xml>
<link>http://foo.com/this-platform/scripts/click.php?var_a=a&var_b=b&varc=http%3A%2F%2Fwww.foo.com%2Fthis-section-here%2Fperf%2F229408%3Fvalue%3D0222%26some_variable%3Dmeee</link>
</xml>
STR;
You might use a simple call to str_replace
, like this :
$str = str_replace('&', '&', $str);
And, then, parse the string (now XML-valid) that's in $str
:
$xml = simplexml_load_string($str);
var_dump($xml);
In this case, it should work...
But note that you must take care about entities : if you already have an entity like '>
', you must not replace it to '&gt;
' !
Which means that such a simple call to str_replace
is not the right solution : it will probably break stuff on many XML feeds !
Up to you to find the right way to do that replacement -- maybe with some kind of regex...
Upvotes: 4
Reputation: 9604
The XML snippet you posted is not valid. Ampersands have to be escaped, this is why the parser complaints.
Upvotes: 4
Reputation: 321578
It breaks the parser because your XML is invalid - &
should be encoded as &
.
Upvotes: 2