Spacehamster
Spacehamster

Reputation: 195

Out-of-memory error while reading very large text file in vb.net

I've been tasked with processing a 3.2GB fixed-width delimited text file. Each line is 1563 chars long, and there are approximately 2.1 million lines in the text file. After reading about 1 million lines, my program crashes with an out-of-memory exception error.

Imports System.IO
Imports Microsoft.VisualBasic.FileIO

Module TestFileCount
    ''' <summary>
    ''' Gets the total number of lines in a text file by reading a line at a time
    ''' </summary>
    ''' <remarks>Crashes when count reaches 1018890</remarks>
    Sub Main()
        Dim inputfile As String = "C:\Split\BIGFILE.txt"
        Dim count As Int32 = 0
        Dim lineoftext As String = ""

        If File.Exists(inputfile) Then
            Dim _read As New StreamReader(inputfile)
            Try
                While (_read.Peek <> -1)
                    lineoftext = _read.ReadLine()
                    count += 1
                End While

                Console.WriteLine("Total Lines in " & inputfile & ": " & count)
            Catch ex As Exception
                Console.WriteLine(ex.Message)
            Finally
                _read.Close()
            End Try
        End If
    End Sub
End Module

It's a pretty straightforward program that reads the text file one line at a time, so I assume it shouldn't take up too much memory in the buffer.

For the life of me, I can't figure out why it's crashing. Does anyone here have any ideas?

Upvotes: 5

Views: 6015

Answers (2)

saysansay
saysansay

Reputation: 96

Try to use ReadAsync, or you can use DiscardBufferedData(but this slow )

Dim inputfile As String = "C:\Example\existingfile.txt" 
    Dim result() As String 
    Dim builder As StringBuilder = New StringBuilder()

    Try
        Using reader As StreamReader = File.OpenText(inputfile)
            ReDim result(reader.BaseStream.Length)
            Await reader.ReadAsync(result, 0, reader.BaseStream.Length)
        End Using 

        For Each str As String In result
            builder.Append(str)         
        Next
      Dim count as Integer=builder.Count()
       Console.WriteLine("Total Lines in " & inputfile & ": " & count)
    Catch ex As Exception
            Console.WriteLine(ex.Message)
    End Try

Upvotes: 0

Scott Chamberlain
Scott Chamberlain

Reputation: 127563

I don't know if this will fix your problem but don't use peek, change your loop to: (this is C# but you should be able to translate it to VB)

while (_read.ReadLine() != null)
{
    count += 1
}

If you need to use the line of text inside the loop instead of just counting lines just modify the code to

while ((lineoftext = _read.ReadLine()) != null)
{
    count += 1
    //Do something with lineoftext
}

Kind of off topic and kind of cheating, if each line really is 1563 chars long (including the line ending) and the file is pure ASCII (so all chars take up one byte) you could just do (once again C# but you should be able to translate)

long bytesPerLine = 1563;
string inputfile = @"C:\Split\BIGFILE.txt"; //The @ symbol is so we don't have to escape the `\`
long length;

using(FileStream stream = File.Open(inputFile, FileMode.Open)) //This is the C# equivilant of the try/finally to close the stream when done.
{
    length = stream.Length;
}

Console.WriteLine("Total Lines in {0}: {1}", inputfile, (length / bytesPerLine ));

Upvotes: 1

Related Questions