Reputation: 37
I have two XML files, one with default names and values (named Test.xml
) and the other one with just the default names (named document.xml
). Goal is to replace the default names with the values - but only on first occurence.
Here is the Test.xml
:
<XML-TEST>
<MyText>Dies ist ein Test</MyText>
<MyTexttwo>Dies ist noch ein Test</MyTexttwo>
</XML-TEST>
Here is the document.xml
(pretty much at the end):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas"
xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex"
xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing"
xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
xmlns:w10="urn:schemas-microsoft-com:office:word"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"
xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml"
xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex"
xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"
xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk"
xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape"
mc:Ignorable="w14 w15 w16se wp14">
<w:body>
<w:p w:rsidR="00E64ECE" w:rsidRDefault="00E64ECE" w:rsidP="00E64ECE">
<w:proofErr w:type="spellStart" />
<w:r>
<w:t>MyText</w:t>
</w:r>
<w:proofErr w:type="spellEnd" />
</w:p>
<w:p w:rsidR="00D50239" w:rsidRPr="00E64ECE" w:rsidRDefault="00E64ECE" w:rsidP="00E64ECE">
<w:r>
<w:t>MyTexttwo</w:t>
</w:r>
<w:bookmarkStart w:id="0" w:name="_GoBack" />
<w:bookmarkEnd w:id="0" />
</w:p>
<w:sectPr w:rsidR="00D50239" w:rsidRPr="00E64ECE">
<w:pgSz w:w="11906" w:h="16838" />
<w:pgMar w:top="1417" w:right="1417" w:bottom="1134" w:left="1417" w:header="708" w:footer="708" w:gutter="0" />
<w:cols w:space="708" />
<w:docGrid w:linePitch="360" />
</w:sectPr>
</w:body>
</w:document>
What am I doing with PowerShell?
I save the Test.xml
(the one with values) in a hashtable:
PS> $XMLSourceHashtable Name Value ---- ----- MyText Dies ist ein Test MyTexttwo Dies ist noch ein Test
Save document.xml
into a variable $DocumentXml
.
Use foreach
to replace what I need:
foreach ($key in ($XMLSourceHashtable.GetEnumerator())) {
# If one key.value is "false" replace the 1:1 name with Char
if ($key | Where-Object {$_.Value -eq "false"}) {
#$key.Name.Trim()
#$DocumentXml.InnerXml = $DocumentXml.InnerXml.Replace($key.Name.Trim(), "â˜")
} elseif ($key | Where-Object {$_.Value -eq "true"}) {
# If one key.value is "true" replace the 1:1 name with Char
#$key.Name.Trim()
#$DocumentXml.InnerXml = $DocumentXml.InnerXml.Replace($key.Name.Trim(), "☒")
} else {
# Everything else needs to be replaced by value in hashtable
#Write-Host $key.Name.Trim() "--------------" $key.Value.Trim()
#$DocumentXml.InnerXml = $DocumentXml.InnerXml.Replace($key.Name.Trim(), $key.Value.Trim())
}
}
The first two elseif
are working fine and they should be not considered. It's the else
which I'm concered about.
What happens?
The text is going to replace of course but the replace methode will do the following:
Values in the document.xml
are being replaced like this:
"MyText" → "Dies ist ein Test"
"MyTexttwo" → Dies ist ein Testtwo"
but it should be:
"MyText" → "Dies ist ein Test"
"MyTexttwo" → Dies ist noch ein Test"
The point is, that "MyText" is being recognized in "MyTexttwo". Each "Name" is actual unique but its not handled like it's unique. I know that's possible to replace on first occurence but only with RegEx. But I can't convert the xml to regex and back again. Is there something else I can do?
Upvotes: 2
Views: 676
Reputation: 61188
Although the advise Tomalak gave to NEVER use string replacement in XML is good advise, here's an answer to your question The point is, that "MyText" is being recognized in "MyTexttwo". Each "Name" is actual unique but its not handled like it's unique
The Replace method you use does not match the the WHOLE string. "MyTextTwo" starts with "MyText", so in your function that part of the name is replaced. "MyTextTwo" then no longer exists.
In order to do a replace only if the complete string matches and not just part of it. If you still want to use string replacement, I would suggest:
$nameToReplace = $key.Name.Trim()
$DocumentXml.InnerXml = $DocumentXml.InnerXml -replace "\A$nameToReplace\z", $key.Value.Trim()
The \A
and \z
symbols are anchors to tell the regex replace the string must be exactly what you give it. (positional asserts)
If you also need to be sure that the replacement only takes place if the casing also matches, you can use
$nameToReplace = $key.Name.Trim()
$DocumentXml.InnerXml = $DocumentXml.InnerXml -creplace "\A$nameToReplace\z", $key.Value.Trim()
Upvotes: -1
Reputation: 338326
Your approach is much too complicated. Use XPath. In principle - load, modify, save:
$document = New-Object xml
$document.Load('Document.xml')
$element = $document.SelectSingleNode("//some/path")
$element.InnerText = "some new value"
$document.Save('Document_2.xml')
The only slight complication here is that you are dealing with a Word document, and they use XML namespaces (written as xmlns:foo="...namespace URI..."
in the XML source), so you need to use namespaces, too (see: Using PowerShell, how do I add multiple namespaces (one of which is the default namespace)?):
$document = New-Object xml
$document.Load('Document.xml')
# use a namespace manager to register the w: namespace prefix
$namespaces = New-Object System.Xml.XmlNamespaceManager $document.NameTable
$namespaces.AddNamespace('w', 'http://schemas.openxmlformats.org/wordprocessingml/2006/main')
foreach ($item in $XMLSourceHashtable) {
$searchText = $item.Name;
$element = $document.SelectSingleNode("//w:t[.='$searchText']", $namespaces)
$element.InnerText = $item.Value
}
$document.Save('Document_2.xml')
The "//w:t[.='$searchText']"
will be interpolated into XPath expressions like //w:t[.='MyText']
- and this path will select all <w:t>
elements in the input XML that have 'MyText'
as their value. Using .SelectSingleNode()
will return only the first of those, which seems to be what you want.
You can use .SelectNodes()
and another foreach
loop to edit all occurences:
foreach ($element in $document.SelectNodes("//w:t[.='$searchText']", $namespaces)) {
$element.InnerText = $item.Value
}
Upvotes: 3