J.Doe
J.Doe

Reputation: 37

InnerXml replace, but only once

I have two XML files, one with default names and values (named Test.xml) and the other one with just the default names (named document.xml). Goal is to replace the default names with the values - but only on first occurence.

Here is the Test.xml:

<XML-TEST>
    <MyText>Dies ist ein Test</MyText>
    <MyTexttwo>Dies ist noch ein Test</MyTexttwo>
</XML-TEST>

Here is the document.xml (pretty much at the end):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas"
    xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex"
    xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    xmlns:o="urn:schemas-microsoft-com:office:office"
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
    xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
    xmlns:v="urn:schemas-microsoft-com:vml"
    xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing"
    xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
    xmlns:w10="urn:schemas-microsoft-com:office:word"
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
    xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"
    xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml"
    xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex"
    xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"
    xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk"
    xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
    xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape"
    mc:Ignorable="w14 w15 w16se wp14">
  <w:body>
    <w:p w:rsidR="00E64ECE" w:rsidRDefault="00E64ECE" w:rsidP="00E64ECE">
      <w:proofErr w:type="spellStart" />
      <w:r>
        <w:t>MyText</w:t>
      </w:r>
      <w:proofErr w:type="spellEnd" />
    </w:p>
    <w:p w:rsidR="00D50239" w:rsidRPr="00E64ECE" w:rsidRDefault="00E64ECE" w:rsidP="00E64ECE">
      <w:r>
        <w:t>MyTexttwo</w:t>
      </w:r>
      <w:bookmarkStart w:id="0" w:name="_GoBack" />
      <w:bookmarkEnd w:id="0" />
    </w:p>
    <w:sectPr w:rsidR="00D50239" w:rsidRPr="00E64ECE">
      <w:pgSz w:w="11906" w:h="16838" />
      <w:pgMar w:top="1417" w:right="1417" w:bottom="1134" w:left="1417" w:header="708" w:footer="708" w:gutter="0" />
      <w:cols w:space="708" />
      <w:docGrid w:linePitch="360" />
    </w:sectPr>
  </w:body>
</w:document>

What am I doing with PowerShell?

  1. I save the Test.xml (the one with values) in a hashtable:

    PS> $XMLSourceHashtable
    
    Name         Value                                                                                                                                                                                                                                                                                                                                                             
    ----         -----                                                                                                                                                                                                                                                                                                                                                             
    MyText       Dies ist ein Test                                                                                                                                                                                                                                                                                                                                                 
    MyTexttwo    Dies ist noch ein Test
    
  2. Save document.xml into a variable $DocumentXml.

  3. Use foreach to replace what I need:

    foreach ($key in ($XMLSourceHashtable.GetEnumerator())) {
        # If one key.value is "false" replace the 1:1 name with Char
        if ($key | Where-Object {$_.Value -eq "false"}) {
            #$key.Name.Trim()
            #$DocumentXml.InnerXml = $DocumentXml.InnerXml.Replace($key.Name.Trim(), "â˜")
        } elseif ($key | Where-Object {$_.Value -eq "true"}) {
            # If one key.value is "true" replace the 1:1 name with Char
            #$key.Name.Trim()
            #$DocumentXml.InnerXml = $DocumentXml.InnerXml.Replace($key.Name.Trim(), "☒")
        } else {
            # Everything else needs to be replaced by value in hashtable
            #Write-Host $key.Name.Trim() "--------------" $key.Value.Trim()
            #$DocumentXml.InnerXml = $DocumentXml.InnerXml.Replace($key.Name.Trim(), $key.Value.Trim())
        }
    }
    

The first two elseif are working fine and they should be not considered. It's the else which I'm concered about.

What happens?

The text is going to replace of course but the replace methode will do the following:

Values in the document.xml are being replaced like this:

"MyText" → "Dies ist ein Test"
"MyTexttwo" → Dies ist ein Testtwo"

but it should be:

"MyText" → "Dies ist ein Test"
"MyTexttwo" → Dies ist noch ein Test"

The point is, that "MyText" is being recognized in "MyTexttwo". Each "Name" is actual unique but its not handled like it's unique. I know that's possible to replace on first occurence but only with RegEx. But I can't convert the xml to regex and back again. Is there something else I can do?

Upvotes: 2

Views: 676

Answers (2)

Theo
Theo

Reputation: 61188

Although the advise Tomalak gave to NEVER use string replacement in XML is good advise, here's an answer to your question The point is, that "MyText" is being recognized in "MyTexttwo". Each "Name" is actual unique but its not handled like it's unique

The Replace method you use does not match the the WHOLE string. "MyTextTwo" starts with "MyText", so in your function that part of the name is replaced. "MyTextTwo" then no longer exists.

In order to do a replace only if the complete string matches and not just part of it. If you still want to use string replacement, I would suggest:

$nameToReplace = $key.Name.Trim()
$DocumentXml.InnerXml = $DocumentXml.InnerXml -replace "\A$nameToReplace\z", $key.Value.Trim()

The \A and \z symbols are anchors to tell the regex replace the string must be exactly what you give it. (positional asserts)

If you also need to be sure that the replacement only takes place if the casing also matches, you can use

$nameToReplace = $key.Name.Trim()
$DocumentXml.InnerXml = $DocumentXml.InnerXml -creplace "\A$nameToReplace\z", $key.Value.Trim()

Upvotes: -1

Tomalak
Tomalak

Reputation: 338326

Your approach is much too complicated. Use XPath. In principle - load, modify, save:

$document = New-Object xml
$document.Load('Document.xml')

$element = $document.SelectSingleNode("//some/path")
$element.InnerText = "some new value"

$document.Save('Document_2.xml')

The only slight complication here is that you are dealing with a Word document, and they use XML namespaces (written as xmlns:foo="...namespace URI..." in the XML source), so you need to use namespaces, too (see: Using PowerShell, how do I add multiple namespaces (one of which is the default namespace)?):

$document = New-Object xml
$document.Load('Document.xml')

# use a namespace manager to register the w: namespace prefix
$namespaces = New-Object System.Xml.XmlNamespaceManager $document.NameTable
$namespaces.AddNamespace('w', 'http://schemas.openxmlformats.org/wordprocessingml/2006/main')

foreach ($item in $XMLSourceHashtable) {
    $searchText = $item.Name;
    $element = $document.SelectSingleNode("//w:t[.='$searchText']", $namespaces)
    $element.InnerText = $item.Value
}

$document.Save('Document_2.xml')

The "//w:t[.='$searchText']" will be interpolated into XPath expressions like //w:t[.='MyText'] - and this path will select all <w:t> elements in the input XML that have 'MyText' as their value. Using .SelectSingleNode() will return only the first of those, which seems to be what you want.

You can use .SelectNodes() and another foreach loop to edit all occurences:

foreach ($element in $document.SelectNodes("//w:t[.='$searchText']", $namespaces)) {
    $element.InnerText = $item.Value
}

Upvotes: 3

Related Questions