Reputation: 195
I've been tasked with processing a 3.2GB fixed-width delimited text file. Each line is 1563 chars long, and there are approximately 2.1 million lines in the text file. After reading about 1 million lines, my program crashes with an out-of-memory exception error.
Imports System.IO
Imports Microsoft.VisualBasic.FileIO
Module TestFileCount
''' <summary>
''' Gets the total number of lines in a text file by reading a line at a time
''' </summary>
''' <remarks>Crashes when count reaches 1018890</remarks>
Sub Main()
Dim inputfile As String = "C:\Split\BIGFILE.txt"
Dim count As Int32 = 0
Dim lineoftext As String = ""
If File.Exists(inputfile) Then
Dim _read As New StreamReader(inputfile)
Try
While (_read.Peek <> -1)
lineoftext = _read.ReadLine()
count += 1
End While
Console.WriteLine("Total Lines in " & inputfile & ": " & count)
Catch ex As Exception
Console.WriteLine(ex.Message)
Finally
_read.Close()
End Try
End If
End Sub
End Module
It's a pretty straightforward program that reads the text file one line at a time, so I assume it shouldn't take up too much memory in the buffer.
For the life of me, I can't figure out why it's crashing. Does anyone here have any ideas?
Upvotes: 5
Views: 6015
Reputation: 96
Try to use ReadAsync, or you can use DiscardBufferedData(but this slow )
Dim inputfile As String = "C:\Example\existingfile.txt"
Dim result() As String
Dim builder As StringBuilder = New StringBuilder()
Try
Using reader As StreamReader = File.OpenText(inputfile)
ReDim result(reader.BaseStream.Length)
Await reader.ReadAsync(result, 0, reader.BaseStream.Length)
End Using
For Each str As String In result
builder.Append(str)
Next
Dim count as Integer=builder.Count()
Console.WriteLine("Total Lines in " & inputfile & ": " & count)
Catch ex As Exception
Console.WriteLine(ex.Message)
End Try
Upvotes: 0
Reputation: 127563
I don't know if this will fix your problem but don't use peek, change your loop to: (this is C# but you should be able to translate it to VB)
while (_read.ReadLine() != null)
{
count += 1
}
If you need to use the line of text inside the loop instead of just counting lines just modify the code to
while ((lineoftext = _read.ReadLine()) != null)
{
count += 1
//Do something with lineoftext
}
Kind of off topic and kind of cheating, if each line really is 1563 chars long (including the line ending) and the file is pure ASCII (so all chars take up one byte) you could just do (once again C# but you should be able to translate)
long bytesPerLine = 1563;
string inputfile = @"C:\Split\BIGFILE.txt"; //The @ symbol is so we don't have to escape the `\`
long length;
using(FileStream stream = File.Open(inputFile, FileMode.Open)) //This is the C# equivilant of the try/finally to close the stream when done.
{
length = stream.Length;
}
Console.WriteLine("Total Lines in {0}: {1}", inputfile, (length / bytesPerLine ));
Upvotes: 1