s3v3n
s3v3n

Reputation: 8446

Insert CDATA into an XML

I'm in a real hurry right now, and I'm begging REGEX masters for help! I'm receiving an XML trough a HTTP request, and I just can't parse it since it contains some special chars not being wrapped in CDATA sections.

example XML:

<root>
    <node>good node</node>
    <node>bad node containing &</node>
<root>

Trying to parse this XML with simplexml_load_string($xml) I get:

Warning: simplexml_load_string() [function.simplexml-load-string]:
Entity: line 3: parser error : xmlParseEntityRef: no name in /..../file.php on line ##

Supposing that the bad nodes will not contain > or <, I need a REGEX that will wrap the text in that nodes in CDATA sections. I guess there will be some lookarounds, I just can't do it quickly.

Thank you!

Upvotes: 1

Views: 1429

Answers (1)

Code Jockey
Code Jockey

Reputation: 6721

If you can indeed assume that there will be no < or > characters inside the nodes you want to CDATA-ize, then this should work just fine for your situation:

>(?=[^<&]*&)([^<]*)<

replacing with

<!CDATA[\1]]>

This expression only looks for nodes that contain & characters (whether or not they are part of HTML entities), then wraps the contents of those nodes in a CDATA tag, if you need to ignore & characters inside entities, that's a considerable bit tougher, but I'd be willing to give it a look.

Upvotes: 2

Related Questions