Reputation: 8446
I'm in a real hurry right now, and I'm begging REGEX masters for help! I'm receiving an XML trough a HTTP request, and I just can't parse it since it contains some special chars not being wrapped in CDATA sections.
example XML:
<root>
<node>good node</node>
<node>bad node containing &</node>
<root>
Trying to parse this XML with simplexml_load_string($xml)
I get:
Warning: simplexml_load_string() [function.simplexml-load-string]:
Entity: line 3: parser error : xmlParseEntityRef: no name in /..../file.php on line ##
Supposing that the bad nodes will not contain >
or <
, I need a REGEX that will wrap the text in that nodes in CDATA sections. I guess there will be some lookarounds, I just can't do it quickly.
Thank you!
Upvotes: 1
Views: 1429
Reputation: 6721
If you can indeed assume that there will be no <
or >
characters inside the nodes you want to CDATA-ize, then this should work just fine for your situation:
>(?=[^<&]*&)([^<]*)<
replacing with
<!CDATA[\1]]>
This expression only looks for nodes that contain &
characters (whether or not they are part of HTML entities), then wraps the contents of those nodes in a CDATA tag, if you need to ignore &
characters inside entities, that's a considerable bit tougher, but I'd be willing to give it a look.
Upvotes: 2