Reputation: 1070
I have a simple PowerShell script that replaces "false" or "true" with "0" or "1":
$InputFolder = $args[0];
if($InputFolder.Length -lt 3)
{
Write-Host "Enter a path name as your first argument" -foregroundcolor Red
return
}
if(-not (Test-Path $InputFolder)) {
Write-Host "File path does not appear to be valid" -foregroundcolor Red
return
}
Get-ChildItem $InputFolder
$content = [System.IO.File]::ReadAllText($InputFolder).Replace("`"false`"", "`"0`"").Replace("`"true`"", "`"1`"").Replace("`"FALSE`"", "`"0`"").Replace("`"TRUE`"", "`"1`"")
[System.IO.File]::WriteAllText($InputFolder, $content)
[GC]::Collect()
This works fine for almost all files I have to amend, with the exception of one 808MB CSV. I have no idea how many lines are in this CSV, as nothing I have will open it properly.
Interestingly, the PowerShell script will complete successfully when invoked manually via either PowerShell directly or via command prompt. When this is launched as part of the SSIS package it's required for, that's when the error happens.
Sample data for the file:
"RowIdentifier","DateProfileCreated","IdProfileCreatedBy","IDStaffMemberProfileRole","StaffRole","DateEmploymentStart","DateEmploymentEnd","PPAID","GPLocalCode","IDStaffMember","IDOrganisation","GmpID","RemovedData"
"134","09/07/1999 00:00","-1","98","GP Partner","09/07/1999 00:00","14/08/2009 15:29","341159","BRA 871","141","B83067","G3411591","0"
Error message thrown:
I'm not tied to PowerShell - I'm open to other options. I had a cribbed together C# script previously, but that died on small files than this - I'm no C# developer, so was unable to debug it at all.
Any suggestions or help gratefully received.
Upvotes: 1
Views: 2214
Reputation: 438093
Generally, avoiding read large files all at once, as you can run out of memory, as you've experienced.
Instead, process text-based files line by line - both reading and writing.
While PowerShell generally excels at line-by-line (object-by-object) processing, it it is slow with files with many lines.
Using the .NET Framework directly - while more complex - offers much better performance.
If you process the input file line by line, you cannot directly write back to it and must instead write to a temporary output file, which you can replace the input file with on success.
Here's a solution that uses .NET types directly for performance reasons:
# Be sure to use a *full* path, because .NET typically doesn't have the same working dir. as PS.
$inFile = Convert-Path $Args[0]
$tmpOutFile = [io.path]::GetTempFileName()
$tmpOutFileWriter = [IO.File]::CreateText($tmpOutFile)
foreach ($line in [IO.File]::ReadLines($inFile)) {
$tmpOutFileWriter.WriteLine(
$line.Replace('"false"', '"0"').Replace('"true"', '"1"').Replace('"FALSE"', '"0"').Replace('"TRUE"', '"1"')
)
}
$tmpOutFileWriter.Dispose()
# Replace the input file with the temporary file.
# !! BE SURE TO MAKE A BACKUP COPY FIRST.
# -WhatIf *previews* the move operation; remove it to perform the actual move.
Move-Item -Force -LiteralPath $tmpOutFile $inFile -WhatIf
Note:
UTF-8 encoding is assumed, and the rewritten file will not have a BOM. You can change this by specifying the desired encoding to the .NET methods.
As an aside: Your chain of .Replace()
calls on each input line can be simplified as follows, using PowerShell's -replace
operator, which is case-insensitive, so only 2 replacements are needed:
$line -replace '"false"', '"0"' -replace '"true"', '"1"'
However, while that is shorter to write, it is actually slower than the .Replace()
call chain, presumably because -replace
is regex-based, which incurs extra processing.
Upvotes: 3
Reputation: 6860
You could read the file Per line with get-content -readcount, Out-file a temp file, then delete old file and rename-item the temp file the old files name.
Small things that would need fixing. This will add a new empty line at end of file. This will change the encoding. You could try and get the current file encoding and set the encoding on the Out-file -encoding
function Replace-LargeFilesInFolder(){
Param(
[string]$DirectoryPath,
[string]$OldString,
[string]$NewString,
[string]$TempExtention = "temp",
[int]$LinesPerRead = 500
)
Get-ChildItem $DirectoryPath -File | %{
$File = $_
Get-Content $_.FullName -ReadCount $LinesPerRead |
%{
$_ -replace $OldString, $NewString |
out-file "$($File.FullName).$($TempExtention)" -Append
}
Remove-Item $File.FullName
Rename-Item "$($File.FullName).$($TempExtention)" -NewName $($File.FullName)
}
}
Replace-LargeFilesInFolder -DirectoryPath C:\TEST -LinesPerRead 1 -OldString "a" -NewString "5"
Upvotes: 1