Richard Szalay
Richard Szalay

Reputation: 84784

Modify xml while preserving whitespace

I'm running into several problems trying to replace an attribute in an XML file while preserving whitespace.

Attempt 1

$xml = [xml](get-content data.xml)
$xml.Path.To.Attribute = $value
set-content data.xml [String]$value

Result: Insigificant whitespace (namely newlines) are removed

Attempt 2

$xml = new-object xml
$xml.PreserveWhitespace = true
$xml.PreserveWhitespace

Result: PreserveWhitespace remains false

Attempt 3

$xml = get-content data.xml
$xml = [regex]::replace($xml, "pattern", "replacement")
set-content data.xml $xml

Result: [regex]::replace messes up the line endings

Am I taking crazy pills here?

Upvotes: 7

Views: 5220

Answers (4)

mklement0
mklement0

Reputation: 439597

There's good information in some of the answers, but let me try to provide a systematic summary and to address your own attempts:

  • In order to preserve insignificant whitespace in an XML document (System.Xml.XmlDocument, [xml] in PowerShell) read from a file, .PreserveWhitespace = $true must be set, notably before loading content, such as from a file.

  • However, with a file you must also ensure that the file is read (loaded) and saved correctly:

    • The robust way to read XML files is to use the .Load() method:

      • $xml = [xml]::new(); $xml.Load((Convert-Path -LiteralPath data.xml))

        • Note the need to use Convert-Path to ensure that the input file path is a full (absolute) one, which is necessary, because .NET's working directory usually differs from PowerShell's.

        • In PowerShell v4-, where the static ::new() method for constructor calls isn't available, use the following instead:
          $xml = New-Object xml; $xml.Load((Convert-Path -LiteralPath data.xml))

      • To be safe, do not read XML files as plain text: Notably, [xml] [System.IO.File]::ReadAllText("data.xml") - from your own answer - and its v3+ PowerShell (near) equivalent[1] - [xml] (Get-Content -Raw -LiteralPath data.xml) are not robust, because they can cause the XML file's character encoding to be misinterpreted, as it is possible for the true encoding to only be detectable via the XML declaration's encoding attribute - see this answer for details.

    • The robust way to save an XML document to a file is to use the .Save() method

      • $xml.Save((Convert-Path -LiteralPath data.xml))

        • Note: More work is needed when saving to a file that doesn't exist yet, because Convert-Path unfortunately only works with existing path as of PowerShell 7.3.3 (see GiHub issue #2993, in which a future -SkipPathValidation parameter has been green-lit); e.g.:

           # Note: If there's a chance that the current location isn't a *file-system*
           #       location, replace $PWD.ProviderPath below with 
           #       (Get-Location -PSProvider FileSystem).ProviderPath
          
           # Save to file 'new.xml' in the current location.
           $xml.Save((Join-Path $PWD.ProviderPath new.xml))
          
           # More flexible PowerShell (Core) 7+ alternative:
           $xml.Save[IO.Path]::GetFullPath('new.xml', $PWD.ProviderPath))
          
      • The same argument applies as to reading XML files:

        • To be safe, do not save XML document as plain text (such as Set-Content)
        • As with reading, you could end up with the wrong character encoding; only using .Save() honors the encoding attribute specified in the XML declaration.

To put it all together:

# Construct an empty [xml] instance.
$xml = [xml]::new() # In PSv4-: New-Object xml

# Instruct it to preserve whitespace when content is loaded later,
# as well as on saving.
$xml.PreserveWhitespace = $true

# Load the document from your file
# Note the use of Convert-Path to ensure that a *full* path is used.
$xmlFileFullPath = Convert-Path -LiteralPath data.xml
$xml.Load($xmlFileFullPath)

# ... modify $xml

# Save the modified document back to the file.
# Note: If you were to write to a *different* file, again be 
#       sure to specify a *full* path.
$xml.Save($xmlFileFullPath)

As for what you tried:

Re Attempt 1

$xml = [xml](get-content data.xml)

  • Because Get-Content by default reads a text file line by line, so that information about the original newlines is invariably lost in the process.

  • Therefore, this method of loading an XML file is fundamentally unsuited to preserving the original whitespace in the file, as you've discovered yourself. However, as discussed, [xml] [System.IO.File]::ReadAllText("data.xml") and [xml] (Get-Content -Raw -LiteralPath data.xml) aren't fully robust either - use .Load() instead.

Apart from that, preserving the original whitespace requires opt-in, which the idiom ([xml] (<# XML text, possibly from a file #>) doesn't support, given that the [xml] instance's .PreserveWhitespace property must be set to $true before content is loaded.

set-content data.xml [String]$xml

As discussed, Set-Content also isn't a robust way to save an XML document to a file. Even if no encoding problems happen to arise, the absence of -NoNewLine (v5+) would result in a platform-native newline getting appended to the file, which may be at odds with the file's original newline format.

Additionally, [String]$xml does not return the XML text of an [xml] instance - you need .OuterXml for that.

Re Attempt 2

$xml.PreserveWhitespace = true

This is a simple syntax problem:

  • PowerShell's Boolean ([bool]) constants are $true and $false, so true should be $true

  • Neglecting to use $ does not cause a syntax error, however: it causes true to be interpreted as a command (a PowerShell cmdlet, script, function, external program, ...), and if there is none by that name,[2] an unrecognized-command error is emitted that terminates the statement, so that no property assignment takes place.

Re Attempt 3

Result: [regex]::replace messes up the line endings

No: [regex]::Replace() has no effect on line endings (newlines).
(As an aside: consider using PowerShell's -replace operator instead.)

Instead, the problem - loss of newlines due to creating an array of lines - occurred earlier, in your Get-Content call, as previously discussed.


[1] It is only fully equivalent in PowerShell (Core) 7+, which - like .NET APIs - defaults to (BOM-less) UTF-8. Windows PowerShell, by contrast, assumes ANSI encoding when reading a file without a BOM.

[2] On Unix-like platform, there actually is an external program named true, which produces no output, which - when PowerShell coerces that to a [bool] - becomes $false.

Upvotes: 0

mjolinor
mjolinor

Reputation: 68331

This isn't working because PreserveWhiteSpace is a boolean:

$xml = new-object xml
$xml.PreserveWhitespace = true
$xml.PreserveWhitespace

Use:

 $xml.PreserveWhitespace = $true

Upvotes: 8

arielhad
arielhad

Reputation: 2163

By default empty lines are ignored, in order to preserve them you can change PreserveWhitespace property before reading the file:

Create XmlDocument object and configure PreserveWhitespace:

$xmlDoc = [xml]::new()
$xmlDoc.PreserveWhitespace = $true

Load the document:

$xmlDoc.Load($myFilePath)

or

$xmlDoc.LoadXml($(Get-Content $myFilePath -Raw))

Upvotes: 4

Richard Szalay
Richard Szalay

Reputation: 84784

The problems were all related: Get-Content returns lines of the text file, not the text itself. When cast back to a string, the lines are combined outright.

The best solution was to use:

$xml = [xml]([System.IO.File]::ReadAllText("data.xml"))

Upvotes: 8

Related Questions