Reputation: 11
I have a LARGE XML file. I'm troubleshooting some things, and I would like to extract specific nodes from the XML file. I don't want a SimpleXML object, I want to make a new file with the raw string matching what I want (posting this on bash/sed/php).
<?xml version="1.0" encoding="UTF-8"?>
<definition></definition>
<metadata></metadata>
<nodeToRegex>
<nodeImightwant>
<subnode>
<subsubnode1></subsubnode1>
<subsubnodeToCheck>stringCheck</subnodeToCheck>
<subsubnode2></subsubnode2>
</subnode>
</nodeImightwant>
<nodeImightwant></nodeImightwant>
<nodeImightwant></nodeImightwant>
</nodeToRegex>
So from this XML file, I want all lines from every node except the nodeToRegex. From nodeToRegex, I only want the nodeImightwant if the stringCheck string equals "aValidString". Can this be done via regex or should I just copy and paste the stuff out of the file? (my regex skills are subpar)
Upvotes: 0
Views: 977
Reputation: 44851
Don't parse XML with regexes. There is no reason you can't repackage/rearrange the data using SimpleXML, but trying to do it with a regex is a recipe for lots of headaches and, ultimately, broken code.
See this classic example for why parsing XML/HTML/XHTML with regexes is the road to madness.
If you insist on using a regex, just replace the nodes you don't want, like this:
$myxml = preg_replace('~<nodeToRegex>.*?</nodeToRegex>~', '', $myxml);
Upvotes: 1