zoomzoomvince
zoomzoomvince

Reputation: 247

Powershell - xml

I have an input XML file which contains normal HTML names for various characters e.g. Double Quote = " etc.

<Notes>Double Quote &quot; Single Quote &pos; Ampersand &amp;</Notes>

Before

<?xml version="1.0" encoding="UTF-8"?>
<OrganisationUnits>
  <OrganisationUnitsRow num="8">
    <OrganisationId>ACME24/7HOME</OrganisationId>
    <OrganisationName>ACME LTD</OrganisationName>
    <Notes>Double Quote &quot; Single Quote &pos; Ampersand &amp; </Notes>
    <Sector>P</Sector>
    <SectorDesc>Private Private &amp; Voluntary</SectorDesc>
  </OrganisationUnitsRow>
</OrganisationUnits>

After

<?xml version="1.0" encoding="UTF-8"?>
<OrganisationUnits>
  <OrganisationUnitsRow num="8">
    <OrganisationId>ACME24/7HOME</OrganisationId>
    <OrganisationName>ACME LTD</OrganisationName>
    <Notes>Double Quote " Single Quote ' Ampersand &</Notes>
    <Sector>P</Sector>
    <SectorDesc>Private Private & Voluntary</SectorDesc>
  </OrganisationUnitsRow>
</OrganisationUnits>

I am treating the file as XML and it gets processed OK, nothing very fancy.

$xml = [xml](Get-Content $path\$File)
foreach ($CMCAddressesRow in $xml.OrganisationUnits.OrganisationUnitsRow) {
    blah
    blah
}
$xml.Save("$path\$File")

When the output is saved all the HTML codes like &quot; get replaced by ". How can I retain the original HTML &quot; characters? And more importantly why is it happening.

Upvotes: 1

Views: 347

Answers (1)

Ansgar Wiechers
Ansgar Wiechers

Reputation: 200503

What you're referring to is called "character entities". PowerShell converts them on import, so you can work with the actual characters these entities represent, and converts on export only what must be encoded in the XML file. Quotation characters don't need to be encoded in a node value, so they're not being encoded on export.

Upvotes: 2

Related Questions