Susan Melton
Susan Melton

Reputation: 53

In Powershell I'm receiving an "OutOfMemoryException" when working with files over 1gb

I am doing some file clean up before loading into my data warehouse and have run into a file sizing issue:

(Get-Content -path C:\Workspace\workfile\myfile.txt -Raw) -replace '\\"', '"' | Set-Content C:\Workspace\workfile\myfileCLEAN.txt

My file is about 2GB. I am receiving the following error and not sure how to correct.

Get-Content : Exception of type 'System.OutOfMemoryException' was thrown, ........

I am NOT a coder, but I do like learning so am building my own data warehouse. So if you do respond, keep my experience level in mind :)

Upvotes: 5

Views: 2874

Answers (3)

mklement0
mklement0

Reputation: 439257

  • A performant way of reading a text file line by line - without loading the entire file into memory - is to use a switch statement with the -File parameter.

  • A performant way of writing a text file is to use a System.IO.StreamWriter instance.

  • As Mathias points out in his answer, using verbatim \" with the regex-based -replace operator actually replaces " alone, due to the escaping rules of regexes. While you could address that with '\\"', in this case a simpler and better-performing alternative is to use the [string] type's Replace() method, which operates on literal substrings.

To put it all together:

# Note: Be sure to use a *full* path, because .NET's working dir. usually
#       differs from PowerShell's.
$streamWriter = [System.IO.StreamWriter]::new('C:\Workspace\workfile\myfileCLEAN.txt')

switch -File C:\Workspace\workfile\myfile.txt {
  default { $streamWriter.WriteLine($_.Replace('\"', '"')) }
}

$streamWriter.Close()

Note: If you're using an old version of Windows PowerShell, namely version 4 or below, use
New-Object System.IO.StreamWriter 'C:\Workspace\workfile\myfileCLEAN.txt'
instead of
[System.IO.StreamWriter]::new('C:\Workspace\workfile\myfileCLEAN.txt')

Upvotes: 2

Philip Wrage
Philip Wrage

Reputation: 1569

Get-Content loads the whole file into memory.

Try processing line by line to improve memory utilization.

$infile = "C:\Workspace\workfile\myfile.txt"
$outfile = "C:\Workspace\workfile\myfileCLEAN.txt"

foreach ($line in [System.IO.File]::ReadLines($infile)) {
    Add-Content -Path $outfile -Value ($line -replace '\\"','"')
}

Upvotes: 0

Mathias R. Jessen
Mathias R. Jessen

Reputation: 174690

Get-Content -Raw makes PowerShell read the entire file into a single string.

.NET can't store individual objects over 2GB in size in memory, and each character in a string takes up 2 bytes, so after reading the first ~1 billion characters (roughly equivalent to a 1GB ASCII-encoded text file), it reaches the memory limit.

Remove the -Raw switch, -replace is perfectly capable of operating on multiple input strings at once:

(Get-Content -path C:\Workspace\workfile\myfile.txt) -replace '\"', '"' | Set-Content C:\Workspace\workfile\myfileCLEAN.txt

Beware that -replace is a regex operator, and if you want to remove \ from a string, you need to escape it:

(Get-Content -path C:\Workspace\workfile\myfile.txt) -replace '\\"', '"' | Set-Content C:\Workspace\workfile\myfileCLEAN.txt

While this will work, it'll still be slow due to the fact that we're still loading >2GB of data into memory before applying -replace and writing to the output file.

Instead, you might want to pipe the output from Get-Content to the ForEach-Object cmdlet:

Get-Content -path C:\Workspace\workfile\myfile.txt |ForEach-Object {
  $_ -replace '\\"','"'
} |Set-Content C:\Workspace\workfile\myfileCLEAN.txt

This allows Get-Content to start pushing output prior to finishing reading the file, and PowerShell therefore no longer needs to allocate as much memory as before, resulting in faster execution.

Upvotes: 2

Related Questions