Reputation: 173
I am parsing a large XML file looking for certain content, e.g.
$matches = [regex]::matches($content, '(<ac:structured-macro.+?ac:name="jira".+?</ac:structured-macro>)'
i.e. return parts that start and end with the <ac:structured-macro>
with "jira" in them.
What I am finding is finds other records as well, e.g.
<ac:structured-macro blah blah </ac:structured-macro>
<ac:structured-macro blah ac:name="jira" blah </ac:structured-macro>
I want it to find only the ones with "jira" in it.
How do I tell it that if you find the end "ac:structured-macro" and have not found the "jira" part to restart the search ?
Once I find this, I need to get parts inside this match. Is .+?(item1).+?(item2)
the syntax? (similar to C#)
Source sample:
<ac:structured-macro ac:name="jira">
<ac:parameter ac:name="columns">key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution</ac:parameter>
<ac:parameter ac:name="server">JIRA (site.atlassian.net)</ac:parameter>
<ac:parameter ac:name="serverId">72f475d9-a9b2</ac:parameter>
<ac:parameter ac:name="jqlQuery">project = PLATFORM AND issuetype in (Bug, Question, Story) AND fixVersion = 1.12.1 AND component = "UI Framework" </ac:parameter>
<ac:parameter ac:name="maximumIssues">20</ac:parameter>
</ac:structured-macro>
Upvotes: 1
Views: 278
Reputation: 174690
As mentioned in the comments - don't use regex for XML!
Instead, use the built-in capabilities of .NET to parse it and work with it:
$XmlDoc = [xml](Get-Content .\largefile.xml)
Now, the $XmlDoc variable holds a live XmlDocument
that we can inspect and modify programmatically (using XPath
), instead of just plain text
From the contents of your brief snippets, I'm guessing that this large xml file is an XSLT template containing JIRA macro's for Confluence.
Since Confluence uses the namespace prefix ac
- we'll need to create a namespace manager in order to query the document with XPath
:
$XmlNSMgr = New-Object System.Xml.XmlNamespaceManager $XsltDoc.NameTable
$XmlNSMgr.AddNamespace("xsl","http://www.w3.org/1999/XSL/Transform")
$XmlNSMgr.AddNamespace("ac","http://www.atlassian.com/schema/confluence/4/ac/")
Now you can select the desired nodes with the SelectNodes()
method and an XPath
expression:
$XPathExpression = '//ac:structured-macro'
$MacroNodes = $XmlDoc.SelectNodes($XPathExpression, $XmlNSMgr)
$MacroNodes
is now a collection of all <ac:structured-macro>
nodes found in the document.
To select only nodes where the ac:name="jira"
attribute is present, add a clause to the XPath
expression:
$XPathExpression = '//ac:structured-macro[@ac:name = "jira"]'
$JiraMacroNodes = $XmlDoc.SelectNodes($XPathExpression, $XmlNSMgr)
You can even edit nodes and the document will have been modified when you save edit:
$JiraMacroNodes |ForEach-Object {
$_.SetAttribute("attrName","newValue")
}
$XmlDoc.Save("C:\path\to\new.xslt")
Upvotes: 4