Reputation: 435
I have and input XML file that is not correctly formatted ( ie. it has '&' instead of '& amp;') When i try to load this XML using PHP DOM, $doc->load("file.xml") it throws and error and stops the parsing.
Is there any way to load this un-formatted XML? and No I cant edit the source XML file. I did try using $doc->loadHTML() but it throws errors all over the place.
I wanted to know if there is a proper way to do this (like load file contents and change it using regex or something similar)
Upvotes: 0
Views: 1696
Reputation: 302
Try setting $doc->validateOnParse = false;
before loading your XML via $doc->loadHTML(...)
.
Upvotes: 1
Reputation: 149
If you are sure that's the only thing making it not validate, then you could try loading the file into a string with file_get_contents()
function, then search & replace through the string to change the &'s into &
's, then place that string into simpleXML like $xml = simplexml_load_string($cleaned_string);
Upvotes: 0
Reputation: 26627
First, check that it's the &
that's causing the error and not something else.
One way or another, you'll have to modify the XML to get it parsed. The HTML in loadHTML
is loaded from a string, can't you just replace the invalid characters with the correct ones?
If your installation supports the PHP Tidy extension (http://php.net/manual/en/book.tidy.php) you could try to clean it up with that, though in my experience it's far from foolproof.
Upvotes: 0