Reputation: 84784
I'm running into several problems trying to replace an attribute in an XML file while preserving whitespace.
$xml = [xml](get-content data.xml)
$xml.Path.To.Attribute = $value
set-content data.xml [String]$value
Result: Insigificant whitespace (namely newlines) are removed
$xml = new-object xml
$xml.PreserveWhitespace = true
$xml.PreserveWhitespace
Result: PreserveWhitespace
remains false
$xml = get-content data.xml
$xml = [regex]::replace($xml, "pattern", "replacement")
set-content data.xml $xml
Result: [regex]::replace
messes up the line endings
Am I taking crazy pills here?
Upvotes: 7
Views: 5220
Reputation: 439597
There's good information in some of the answers, but let me try to provide a systematic summary and to address your own attempts:
In order to preserve insignificant whitespace in an XML document (System.Xml.XmlDocument
, [xml]
in PowerShell) read from a file, .PreserveWhitespace = $true
must be set, notably before loading content, such as from a file.
However, with a file you must also ensure that the file is read (loaded) and saved correctly:
The robust way to read XML files is to use the .Load()
method:
$xml = [xml]::new(); $xml.Load((Convert-Path -LiteralPath data.xml))
Note the need to use Convert-Path
to ensure that the input file path is a full (absolute) one, which is necessary, because .NET's working directory usually differs from PowerShell's.
In PowerShell v4-, where the static ::new()
method for constructor calls isn't available, use the following instead:
$xml = New-Object xml; $xml.Load((Convert-Path -LiteralPath data.xml))
To be safe, do not read XML files as plain text: Notably, [xml] [System.IO.File]::ReadAllText("data.xml")
- from your own answer - and its v3+ PowerShell (near) equivalent[1] - [xml] (Get-Content -Raw -LiteralPath data.xml)
are not robust, because they can cause the XML file's character encoding to be misinterpreted, as it is possible for the true encoding to only be detectable via the XML declaration's encoding
attribute - see this answer for details.
The robust way to save an XML document to a file is to use the .Save()
method
$xml.Save((Convert-Path -LiteralPath data.xml))
Note: More work is needed when saving to a file that doesn't exist yet, because Convert-Path
unfortunately only works with existing path as of PowerShell 7.3.3 (see GiHub issue #2993, in which a future -SkipPathValidation
parameter has been green-lit); e.g.:
# Note: If there's a chance that the current location isn't a *file-system*
# location, replace $PWD.ProviderPath below with
# (Get-Location -PSProvider FileSystem).ProviderPath
# Save to file 'new.xml' in the current location.
$xml.Save((Join-Path $PWD.ProviderPath new.xml))
# More flexible PowerShell (Core) 7+ alternative:
$xml.Save[IO.Path]::GetFullPath('new.xml', $PWD.ProviderPath))
The same argument applies as to reading XML files:
Set-Content
).Save()
honors the encoding
attribute specified in the XML declaration.To put it all together:
# Construct an empty [xml] instance.
$xml = [xml]::new() # In PSv4-: New-Object xml
# Instruct it to preserve whitespace when content is loaded later,
# as well as on saving.
$xml.PreserveWhitespace = $true
# Load the document from your file
# Note the use of Convert-Path to ensure that a *full* path is used.
$xmlFileFullPath = Convert-Path -LiteralPath data.xml
$xml.Load($xmlFileFullPath)
# ... modify $xml
# Save the modified document back to the file.
# Note: If you were to write to a *different* file, again be
# sure to specify a *full* path.
$xml.Save($xmlFileFullPath)
As for what you tried:
$xml = [xml](get-content data.xml)
Because Get-Content
by default reads a text file line by line, so that information about the original newlines is invariably lost in the process.
Therefore, this method of loading an XML file is fundamentally unsuited to preserving the original whitespace in the file, as you've discovered yourself. However, as discussed, [xml] [System.IO.File]::ReadAllText("data.xml")
and [xml] (Get-Content -Raw -LiteralPath data.xml)
aren't fully robust either - use .Load()
instead.
Apart from that, preserving the original whitespace requires opt-in, which the idiom ([xml] (<# XML text, possibly from a file #>
) doesn't support, given that the [xml]
instance's
.PreserveWhitespace
property must be set to $true
before content is loaded.
set-content data.xml [String]$xml
As discussed, Set-Content
also isn't a robust way to save an XML document to a file. Even if no encoding problems happen to arise, the absence of -NoNewLine
(v5+) would result in a platform-native newline getting appended to the file, which may be at odds with the file's original newline format.
Additionally, [String]$xml
does not return the XML text of an [xml]
instance - you need .OuterXml
for that.
$xml.PreserveWhitespace = true
This is a simple syntax problem:
PowerShell's Boolean ([bool]
) constants are $true
and $false
, so true
should be $true
Neglecting to use $
does not cause a syntax error, however: it causes true
to be interpreted as a command (a PowerShell cmdlet, script, function, external program, ...), and if there is none by that name,[2] an unrecognized-command error is emitted that terminates the statement, so that no property assignment takes place.
Result:
[regex]::replace
messes up the line endings
No: [regex]::Replace()
has no effect on line endings (newlines).
(As an aside: consider using PowerShell's -replace
operator instead.)
Instead, the problem - loss of newlines due to creating an array of lines - occurred earlier, in your Get-Content
call, as previously discussed.
[1] It is only fully equivalent in PowerShell (Core) 7+, which - like .NET APIs - defaults to (BOM-less) UTF-8. Windows PowerShell, by contrast, assumes ANSI encoding when reading a file without a BOM.
[2] On Unix-like platform, there actually is an external program named true
, which produces no output, which - when PowerShell coerces that to a [bool]
- becomes $false
.
Upvotes: 0
Reputation: 68331
This isn't working because PreserveWhiteSpace is a boolean:
$xml = new-object xml
$xml.PreserveWhitespace = true
$xml.PreserveWhitespace
Use:
$xml.PreserveWhitespace = $true
Upvotes: 8
Reputation: 2163
By default empty lines are ignored, in order to preserve them you can change PreserveWhitespace
property before reading the file:
Create XmlDocument object and configure PreserveWhitespace:
$xmlDoc = [xml]::new()
$xmlDoc.PreserveWhitespace = $true
Load the document:
$xmlDoc.Load($myFilePath)
or
$xmlDoc.LoadXml($(Get-Content $myFilePath -Raw))
Upvotes: 4
Reputation: 84784
The problems were all related: Get-Content
returns lines of the text file, not the text itself. When cast back to a string, the lines are combined outright.
The best solution was to use:
$xml = [xml]([System.IO.File]::ReadAllText("data.xml"))
Upvotes: 8