Nithin
Nithin

Reputation: 435

Load an invalid XML in PHP DOM

I have and input XML file that is not correctly formatted ( ie. it has '&' instead of '& amp;') When i try to load this XML using PHP DOM, $doc->load("file.xml") it throws and error and stops the parsing.

Is there any way to load this un-formatted XML? and No I cant edit the source XML file. I did try using $doc->loadHTML() but it throws errors all over the place.

I wanted to know if there is a proper way to do this (like load file contents and change it using regex or something similar)

Upvotes: 0

Views: 1696

Answers (3)

Peavey
Peavey

Reputation: 302

Try setting $doc->validateOnParse = false; before loading your XML via $doc->loadHTML(...).

Upvotes: 1

Forrest
Forrest

Reputation: 149

If you are sure that's the only thing making it not validate, then you could try loading the file into a string with file_get_contents() function, then search & replace through the string to change the &'s into &'s, then place that string into simpleXML like $xml = simplexml_load_string($cleaned_string);

Upvotes: 0

bcoughlan
bcoughlan

Reputation: 26627

First, check that it's the & that's causing the error and not something else.

One way or another, you'll have to modify the XML to get it parsed. The HTML in loadHTML is loaded from a string, can't you just replace the invalid characters with the correct ones?

If your installation supports the PHP Tidy extension (http://php.net/manual/en/book.tidy.php) you could try to clean it up with that, though in my experience it's far from foolproof.

Upvotes: 0

Related Questions