Reputation: 81
I have a small function that seems to cause my server to run out of memory, but I can't work out why. I read in a large CSV (3 million lines) and then in a code block I attempt to a) copy the file and b) unzip it by calling a start-process:
$c = 0;
foreach ($line in $csv) {
$c = $c+1
Write-Host "Processing item $c of $total"
$folder = $line.destination.substring( 0, $line.destination.lastindexof("\") )
if (Test-Path $folder) {
Write-Debug "Folder Exists; "
} Else {
Write-Debug "Folder being created; "
mkdir $folder
}
if (Test-Path $line.original) {
Write-Debug "File to be processed; "
Write-Debug $line.original
Write-Debug $line.destination
try
{
copy-item $line.original $line.destination
}
catch [System.ArgumentException]
{
Write-Warning "ERROR: Could not copy"
Write-Warning "Check file, FROM: $($line.original)"
Write-Warning "Check file, TO : $($line.destination)"
}
$arguments = "-d", "-f", "`"$($line.destination)`""
try
{
start-process -FilePath $command -ArgumentList $arguments -RedirectStandardOutput stdout.txt -RedirectStandardError stderr.txt -WindowStyle Hidden
}
catch [System.ArgumentException]
{
Write-Warning "ERROR: Could not unzip"
Write-Warning "Check file, FROM: $($line.original)"
Write-Warning "Check file, TO : $($line.destination)"
}
} Else {
Write-Warning "ERROR: File not found, line $c"
Write-Warning "Check file, FROM: $($line.original)"
Write-Warning "Check file, TO : $($line.destination)"
}
}
}
At about line 220,000 or so of the 3 million, I get a few errors which I'm attributing to RAM, but which also might not be, google hasn't helped me work these out so far so I'm wondering if it's a memory leak in the script (even though the powershell process doesn't grow over time).
Write-Host : The Win32 internal error "Insufficient quota to complete the requested
service" 0x5AD occurred while getting console output buffer information. Contact
Microsoft Customer Support Services.
start-process : This command cannot be run due to the error: Only part of a
ReadProcessMemory or WriteProcessMemory request was completed.
out-lineoutput : The Win32 internal error "Insufficient quota to complete the requested
service" 0x5AD occurred while getting console output buffer information. Contact Microsoft
Customer Support Services.
Upvotes: 0
Views: 893
Reputation: 2542
When you work with foreach like you do here, it's holding the entire contents of the csv in memory, which for a csv of 3 million lines is going to be substantial. This is where the pipeline can help you out.
You should leverage the pipeline to stream the data, which will lower the memory consumption if you do it right. To get you started consider the following:
Import-Csv -path 'c:\temp\input.csv' | Foreach-Object {
# code for stuff you want to do for each csv line
}
This code will start reading the csv, one line at the time and pass each line to the next command through the pipeline. The line then hits Foreach-Object which means it will execute whatever code is in the scriptblock for each input from the pipeline. You can further send data through the pipline in this fashion if you need to (for instance update a file, etc).
The main thing to know is that it will be streaming the data instead of reading everything into memory in one go, as you are doing in your script. Sometimes this is desirable because if you have the RAM to spare it's often faster, but in your case you should sacrifice some speed so your are not running out of memory.
Hopefully you will get other suggestions to this question as well, but in the meantime read up on the pipeline if needed, and try to refactor your script to utilize it and see if it helps.
UPDATE! I have tried to rewrite the portion of your code that I have available to me, using the pipeline. You should be able to just copy/paste this into your script (remember to take a backup copy first!)
$pathCSV = 'insert\path\to\csvfile.csv'
$command = 'something'
Import-Csv -Path $pathCSV | ForEach-Object {
try {
$line = $_
$folder = $line.destination.substring( 0, $line.destination.lastindexof('\') )
if (-not(Test-Path $folder)) {
New-Item -Path $folder -ItemType 'Directory'
Write-Verbose "$folder created"
}
else {
Write-Verbose "$folder already exists"
}
if (Test-Path $line.original) {
Write-Verbose "File to be processed: $($line.original) [original] - $($line.destination) [destination]"
Copy-Item $line.original $line.destination
$arguments = '-d', '-f', "`"$($line.destination)`""
start-process -FilePath $command -ArgumentList $arguments -RedirectStandardOutput stdout.txt -RedirectStandardError stderr.txt -WindowStyle Hidden
# run garbage collection to try to free up some memory, if this slows down
# the script too much, comment these lines out
[gc]::Collect()
[gc]::WaitForPendingFinalizers()
}
else {
Write-Warning "File not found: $($line.original) [original] - $($line.destination) [destination]"
}
}
catch {
Write-Warning "At line:$($_.InvocationInfo.ScriptLineNumber) char:$($_.InvocationInfo.OffsetInLine) Command:$($_.InvocationInfo.InvocationName), Exception: '$($_.Exception.Message.Trim())'"
}
}
Just remember to fill out the path to the CSV and define $command to whatever it is.
Hopefully this will work, or at least give you something to work further on.
Upvotes: 2