Reputation: 13
We have a program that creates email signatures and stores them in a deployment folder that is then saved to the users local folder when they log in. However when the employee is not assigned to an office, the comma separator for City/State still come along for the ride as shown in this example:
Problem is the program source code cannot be found. Long term I will rewrite it. Short term I need a powershell script that will run every night to remove the line containing the commas. Found the following solution here on Stackoverflow:
Get-ChildItem C:\temp\emailsigs -Filter *.htm | Foreach-Object{
(Get-Content $_.FullName) |
Foreach-Object {$_ -replace " , , <br />", ""} |
Set-Content $_.FullName
}
This actually works pretty well. But I notice that each signature HTM file (over 1100) is getting the timestamp update even when only 2 email signatures need to have the empty comma line removed. Is there a more efficient way to first check if the file contains the offending commas to then replace and skip over the majority?
Upvotes: 1
Views: 1275
Reputation: 438133
The following PSv5+ solution won't be memory-efficient, but should speed up processing while avoiding rewriting of files that don't need it:
Get-ChildItem C:\temp\emailsigs -Filter *.htm |
ForEach-Object {
$oldContent = Get-Content -Raw $_.FullName
$newContent = $oldContent -replace ' , , <br />'
if ($newContent.Length -lt $oldContent.Length) { # was a replacement performed?
Set-Content $_.FullName -NoNewline -Value $newContent
}
}
-Raw
is PSv3+ and reads the entire file as a single string.
[System.IO.File]::ReadAllText()
instead, but note that it assumes UTF-8 as the encoding in the absence of a BOM, whereas Get-Content
assumes "ANSI" encoding[1]
(the system's legacy "ANSI" code page), so you may have to specify the encoding explicitly.Processing each file as a single string speeds up processing (though each file must fit into memory twice). Taking advantage of -replace
leaving an input string unmodified if the regex doesn't match, we can compare the length of the original contents to the length of the replaced contents to see if something matched and that the file therefore needs rewriting.
Thus, we only need a single regex operation per file.
... -replace '...'
- i.e., not specifying a replacement string - is equivalent to ... -replace '...', ''
, i.e., to effectively remove what was matched.-NoNewline
requires PSv5+; it prevents an additional newline from getting appended on output.
[System.IO.File]::WriteAllText()
instead, but note that its default encoding is UTF-8 without a BOM, whereas Set-Content
, like Get-Content
, defaults to "ANSI" encoding[1].[1] The above applies to Windows PowerShell. The cross-platform PowerShell Core edition defaults to (BOM-less) UTF-8 as well.
Upvotes: 2
Reputation: 17472
Other method
Get-ChildItem C:\temp\emailsigs -file -Filter *.htm | foreach{
$CurrentFile=$_
$Content=Get-Content $CurrentFile.FullName -Encoding UTF8
if ($Content -like '* , , <br />*')
{
$Content.Replace(' , , <br />', '') | Set-Content $CurrentFile.FullName -Encoding UTF8
}
}
I use utf8 for keep diacritics
Upvotes: 0