Reputation: 31
I wrote a script to read XML files and do some editing on some specific nodes then write the file back out.
The issue i am having is that the output file has some extra charachers added to nodes that I didn't edit.
I am assuming this is an encoding issue.
The Relevant code from my script is
function getAssigneeID ($assigneeName) {
#param($assigneeName)
#write($assigneeName)
$assigneeID = $nameIDHash[$assigneeName]
if ($assigneeID -eq $null -or $a -eq "") {
return 'Not Found'
} else {
return $assigneeID
}
}
function ValidateAssigneeField ($assignee, $fileContent, $fileURI, $assignees) {
If (($assignee.InnerText.Length -le 2) -or ($assignee.InnerText.Length -ne 6) -or ($assignee.InnerText[1] -ne 'Z')) {
write("`tAssignee " + $assignee.InnerText + " is invalid.") >> $Output_Log_File
#find assignee's ID in nameIDHash
$assigneeID = getAssigneeID -assigneeName $assignee.InnerText
if ($assigneeID -eq 'Not Found' -or $assigneeID -eq $null){
write("`t`tThe ID for the invalid user " + $assignee.InnerText + " is Not Found.") >> $Output_Log_File
} else {
#if the assigneeID is in the list of assignees, remove the name, otherwise replace the name with the ID and save the file.
write("`t`tThe ID for the invalid user " + $assignee.InnerText + " is " + $assigneeID) >> $Output_Log_File
$assigneeIdAlreadyInList = $false
foreach ($user in $assignees){
#write("user = " + $user.InnerText + ", ID = " + $assigneeID)
If ($user.InnerText -eq $assigneeID){
$assigneeIdAlreadyInList = $true
} else {
}
}
#write ($assigneeIdAlreadyInList)
if ($assigneeIdAlreadyInList){
write("`t`t" + $assigneeID + " already exists in the assignee list, removing " + $assignee.InnerText) >> $Output_Log_File
[void]$assignee.ParentNode.RemoveChild($assignee)
} else {
write("`t`tReplacing " + $assignee.InnerText + " with " + $assigneeID + ".") >> $Output_Log_File
$assignee.InnerText = $assigneeID
}
write("`t`tSaving the file " + $fileURI + ".") >> $Output_Log_File
#$fileContent.save($fileURI)
#$file | out-file -Encoding "UTF8" -FilePath $fileURI
#$MyXML | out-file -Encoding "UTF8" -FilePath $fileURI
#$fileContent | out-file -Encoding "UTF8" -FilePath $fileURI
}
} else {
write("`tAssignee " + $assignee.InnerText + " is OK.") >> $Output_Log_File
}
}
$workitemBasePath = "C:\temp\dev\workitems\Dev_ECH\"
$Output_Log_File = "C:\ALM\Reports\Dev_ECH - Correct All Assignees.txt"
$NameIDHash = @{
"Jade West" = "zzzzzz"
"Tonya Killebrew" = "AZCJNZ"}
$today = Get-Date -format s
write($today + " - Running Correct invalid Assignees.ps1.") > $Output_Log_File
$files = Get-ChildItem -Path $workitemBasePath -include workitem.xml -Recurse | % { $_.FullName }
foreach ($file in $files){
write("Evaluating " + $file) >> $Output_Log_File
[xml]$MyXML = Get-Content $file
$assigneeList = $MyXML.SelectNodes('//work-item/field[@id="assignee"]/list/item')
if ($assigneeList.count -eq 0) {
$assigneeList = $MyXML.SelectNodes('//work-item/field[@id="assignee"]')
}
foreach ($assignee in $assigneeList) {
ValidateAssigneeField -assignee $assignee -fileContent $MyXML -fileURI $file -assignees $assigneeList
}
}
And then in ValidateAssigneeField I do some editing of the assignee node and save the file with
$fileContent.save($fileURI)
In the output XML file i see the following extra characters added to some of the text fields.
<field id="description" text-type="text/plain">​Navistar has reported that the transmission remains in Drive when the operator selects a fast sequence from Drive to Reverse to Manual mode. When selecting a similar sequence from Reverse to Drive to Manual mode, the transmission drive as expected.</field>
​ and  are added in seemingly random places.
I am assuming i need to find out what encoding the original XML is in and then output my edited XML in the same format.
How do i change the output format of the $fileContent.save($fileURI) command?
<?xml version="1.0" encoding="UTF-8"?>
<work-item>
<field id="assignee">Jade West</field>
<field id="author">RZPRRK</field>
<field id="created">2019-08-08 10:41:39.163 -0400</field>
<field id="description" text-type="text/html">Tst</field>
<field id="dueDate">2019-08-05</field>
<field id="nextReviewDate" type="date">2019-08-15</field>
<field id="osNumber" type="string">23457</field>
<field id="osOpenDate" type="date">2019-07-30</field>
<field id="previousStatus">toBeScreened</field>
<field id="priority">2.0</field>
<field id="rational" text-type="text/html" type="text/html">Test</field>
<field id="release" type="enum:release">na</field>
<field id="resolution">duplicate</field>
<field id="resolvedOn">2019-08-08 10:42:22.987 -0400</field>
<field id="severity">normal</field>
<field id="status">inProcess</field>
<field id="title">HWCR - Reject</field>
<field id="type">hardwareChangeRequest</field>
</work-item>
<?xml version="1.0" encoding="UTF-8"?>
<work-item>
<field id="assignee">
<list>
<item>XZM030</item>
</list>
</field>
<field id="author">XZM030</field>
<field id="automatedTestAffected" type="enum:productDocumentAffected">notRequired</field>
<field id="created">2019-06-06 13:59:27.726 -0400</field>
<field id="customerImpact" type="enum:productGenricYesNo">yes</field>
<field id="customerImpactNotes" text-type="text/plain" type="text/html">See description</field>
<field id="cyberSecurityAffected" type="enum:productGenricYesNo">no</field>
<field id="datalinkTechData" type="enum:productDocumentAffected">notRequired</field>
<field id="description" text-type="text/plain">​Navistar has reported that the transmission remains in Drive when the operator selects a fast sequence from Drive to Reverse to Manual mode. When selecting a similar sequence from Reverse to Drive to Manual mode, the transmission drive as expected.</field>
<field id="designReviewComments" text-type="text/plain" type="text/html">​3/4/19-accepted with addtions to test plan</field>
<field id="designReviewRequired" type="enum:productDocumentCompleted">completed</field>
<field id="designedDate" type="date">2019-03-26</field>
<field id="diagAffected" type="enum:productDiagAffected">no</field>
<field id="fmeaRequired" type="enum:productDocumentAffected">notRequired</field>
<field id="functionalSafetyAffected" type="enum:productGenricYesNo">no</field>
<field id="linkedWorkItems">
<list>
<struct>
<item id="role">affected_by</item>
<item id="workItem">COMM-47223</item>
</struct>
</list>
</field>
<field id="priority">4.0</field>
<field id="release" type="enum:release">na</field>
<field id="requirementsAffected" type="enum:productDocumentAffected">notRequired</field>
<field id="rootCauseDescription" text-type="text/plain" type="text/html">​The TCM logic that controls express preselect for the hold postion looks at if forward is attained but not the currently selected postion.  Therefore with a quick transistion from D-R-H the transmission does not have time to actually make a shift to Reverse and there for the forward attined is still true when hold is recieved. </field>
<field id="screenedDate" type="date">2019-03-26</field>
<field id="serviceImpact" type="enum:productGenricYesNo">yes</field>
<field id="serviceImpactNotes" text-type="text/plain" type="text/html">affects OEMs using the non-ATI standard selector interface only. OEMs using the non- ATI basic selector interface are not effected.</field>
<field id="severity">normal</field>
<field id="sharePointID" type="string">2884</field>
<field id="simToolAffected" type="enum:productDocumentAffected">notRequired</field>
<field id="softwareCRIsRequired" type="boolean">true</field>
<field id="solutionDescription" text-type="text/plain" type="text/html">​The TCM logic that controls express preselect for the hold postion needs to look at the selected position and if forward is attined.</field>
<field id="status">na</field>
<field id="synergyCRNumber" type="string">,10516,</field>
<field id="syscrType" type="string">Incident</field>
<field id="techData(Regular)" type="enum:productDocumentAffected">notRequired</field>
<field id="tempStatus" type="string">n/a</field>
<field id="testPlanAffected" type="enum:productDocumentAffected">notRequired</field>
<field id="testRunWhereValidated" type="string">BCD 191 PC</field>
<field id="title">Other: OEM Standard Shift Selector D-to-R-to-Manual Transition Complaint</field>
<field id="type">other</field>
<field id="typeForDependencyOnly" type="enum:otherProductDependencyType">other</field>
<field id="vepsqaAffected" type="enum:vepsAffected">no</field>
</work-item>
Upvotes: 2
Views: 2483
Reputation: 27428
I would advise against using ">>" or "out-file -append". It can mix different encodings in the same file, especially since out-file defaults to unicode (utf16). "add-content" works better. Bug report: https://github.com/PowerShell/PowerShell/issues/9423
Upvotes: 0
Reputation: 97
Without having an input file try the below change:
[xml]$MyXML = Get-Content $file -Raw
EDIT: You can also output
$file | Out-File -Encoding "UTF8"
EDIT EDIT:
What if you do
$newfile = ValidateAssigneeField -assignee $assignee -fileContent $MyXML -fileURI $file -assignees $assigneeList
$newfile | Out-File -Encoding "UTF8" -FilePath "DESTINATION"
Upvotes: 1