Reputation: 1240
I have a file that has multiple instances of the following:
<password encrypted="True">271NFANCMnd8BFdERjHoAwEA7BTuX</password>
But for each instance the password is different.
I would like the output to delete the encyrpted password:
<password encrypted="True"></password>
What is the best method using PowerShell to loop through all instances of the pattern within the file and output to a new file?
Something like:
gc file1.txt | (regex here) > new_file.txt
where (regex here) is something like:
s/"True">.*<\/pass//
Upvotes: 1
Views: 2913
Reputation: 47832
This one is fairly easy in regex, and you can do it that way, or you can parse it as actual XML, which may be more appropriate. I'll demonstrate both ways. In each case, we'll start with this common bit:
$raw = @"
<xml>
<something>
<password encrypted="True">hudhisd8sd9866786863rt</password>
</something>
<another>
<thing>
<password encrypted="True">nhhs77378hd8y3y8y282yr892</password>
</thing>
</another>
<test>
<password encrypted="False">plain password here</password>
</test>
</xml>
"@
$raw -ireplace '(<password encrypted="True">)[^<]+(</password>)', '$1$2'
$raw -ireplace '(?<=<password encrypted="True">).+?(?=</password>)', ''
$xml = [xml]$raw
foreach($password in $xml.SelectNodes('//password')) {
$password.InnerText = ''
}
$xml = [xml]$raw
foreach($password in $xml.SelectNodes('//password[@encrypted="True"]')) {
$password.InnerText = ''
}
(<password encrypted="True">)[^<]+(</password>)
The first regex method uses 2 capture groups to capture the opening and closing tags, and replaces the entire match with those tags (so the middle is omitted).
(?<=<password encrypted="True">).+?(?=</password>)
The second regex method uses positive lookaheads and lookbehinds. It finds 1 or more characters which are preceded by the opening tag and followed by the closing tag. Since lookarounds are zero-width, they are not part of the match, therefore they don't get replaced.
Here we're using a simple xpath query to find all of the password
nodes. We iterate through each one with a foreach
loop and set its innerText
to an empty string.
The second version checks that the encrypted attribute is set to True
and only operates on those.
I personally think that the XML method is more appropriate, because it means you don't have to account for variations in XML syntax so much. You can also more easily account for different attributes specified on the nodes or different attribute values.
By using xpath you have a lot more flexibility than with regex for processing XML.
I noticed your sample to read the data used gc
(short for Get-Content
). Be aware that this reads the file line-by-line.
You can use this to get your raw content in one string, for conversion to XML or processing by regex:
$raw = Get-Content file1.txt -Raw
You can write it out pretty easily too:
$raw | Out-File file1.txt
Upvotes: 5