Reputation: 4435
I'm trying to parse following XML from a file using Powershell without actually loading it as XML document using [xml] since the document contain errors.
<data>
<company>Walter & Cooper</company>
<contact_name>Patrick O'Brian</contact_name>
</data>
To load document successfully I need to fix errors by replacing special characters as follows
& with &
< with <
' with ' etc..
I know I could do something like this to find and replace characters in a document
(Get-Content $fileName) | Foreach-Object {
$_-replace '&', '&' `
-replace "'", "'" `
-replace '"', '"'} | Set-Content $fileName
But this will replace characters everywhere in the file, I'm only interest in checking for characters inside xml tags like <company> and replacing them with xml safe entities so that resultant text is a valid document which I can load using [xml].
Upvotes: 2
Views: 3227
Reputation: 47219
Something like this should work for each character you need to replace:
$_-replace '(?<=\W)(&)(?=.*<\/.*>)', '&' `
-replace '(?<=\W)(')(?=.*<\/.*>)', ''' `
-replace '(?<=\W)(")(?=.*<\/.*>)', '"' `
-replace '(?<=\W)(>)(?=.*<\/.*>)', '>' `
-replace '(?<=\W)(\*)(?=.*<\/.*>)', '∗' } | Set-Content $fileName
which does a positive look-behind with a non-word character, then the capturing group followed by a positive look-ahead.
examples:
updated: http://regex101.com/r/aY8iV3 | original: http://regex101.com/r/yO7wB1
Upvotes: 2
Reputation: 201822
A little bit of regex look-behind and look-ahead should do the trick:
$str = @'
<data>
<company>Walter & Cooper & Brannigan</company>
<contact_name>Patrick & O'Brian</contact_name>
</data>
'@
$str -replace '(?is)(?<=<company>.*?)&(?=.*?</company>)', '&'
Upvotes: 1