Curious_Bop
Curious_Bop

Reputation: 311

Separating large file and inserting carriage returns based on string

New to VB.Net but a friend recommended that I used it for what I'm trying to do. I have a huge text file and I want to insert carriage returns in after a specific string.

Apart from the mess I have below , how would I alter this to read a file and then once we see the text "ext" insert a new line feed. I'm expecting one of the lines in the input file to produce alot of carriage returns.

Currently what I have managed to mock together below reads an input file until end of line and writes it out again into another file.

Module Module1
Sub Main()
    Try
        ' Create an instance of StreamReader to read from a file. 
        ' The using statement also closes the StreamReader. 
        Using sr As StreamReader = New StreamReader("C:\My Documents\input.txt")
            Dim line As String
            ' Read and display lines from the file until the end of  
            ' the file is reached. 

            Using sw As StreamWriter = New StreamWriter("C:\My Documents\output.txt")
                Do Until sr.EndOfStream
                    line = sr.ReadLine()
                    sw.WriteLine(line)
                    Console.WriteLine("done")
                Loop
            End Using
        End Using
    Catch e As Exception
        ' Let the user know what went wrong.
        Console.WriteLine("The file could not be read:")
        Console.WriteLine(e.Message)
    End Try
    Console.ReadKey()
End Sub

Changes made following comments.. Falling over at 500mb files due to memory constraints:

    Sub Main()
    Try
        ' Create an instance of StreamReader to read from a file. 
        ' The using statement also closes the StreamReader. 
        Using sr As StreamReader = New StreamReader("C:\My Documents\input.txt")
            Dim line As String
            Dim term As String = "</ext>"
            ' Read and display lines from the file until the end of  
            ' the file is reached. 

            Using sw As StreamWriter = New StreamWriter("C:\My Documents\output.txt")
                Do Until sr.EndOfStream
                    line = sr.ReadLine()
                    line = line.Replace(term, term + Environment.NewLine)
                    sw.WriteLine(line)
                    Console.WriteLine("done")
                Loop
            End Using
        End Using

Upvotes: 0

Views: 469

Answers (1)

the_lotus
the_lotus

Reputation: 12748

Since your lines are very big, you'll have to:

  • Read/Write one character at a time
  • Save the last x characters
  • If the last x characters are equal to your term, write a new line

    Dim term As String = "</ext>"
    Dim lastChars As String = "".PadRight(term.Length)
    
    Using sw As StreamWriter = New StreamWriter("C:\My Documents\output.txt")
        Using sr As New System.IO.StreamReader("C:\My Documents\input.txt")
            While Not sr.EndOfStream
                Dim buffer(1) As Char
                sr.Read(buffer, 0, 1)
    
                lastChars &= buffer(0)
                lastChars = lastChars.Remove(0, 1)
    
                sw.Write(buffer(0))
    
                If lastChars = term Then
                    sw.Write(Environment.NewLine)
                End If
    
            End While
        End Using
    End Using
    

Note: This will not work with a Unicode file. This assume each characters are one byte.

Upvotes: 0

Related Questions