Long Bunly
Long Bunly

Reputation: 13

Vb.net how to compare large text files

Hi All below code is how to compare contents in two text file and is work fine for record in files, but my issue when files have a lot line ( 80000 up) my code work very very slow and i cannot accept it. please kindly give me some idea

Public Class Form1

Const TEST1 = "D:\a.txt"
Const TEST2 = "D:\b.txt"
Public file1 As New Dictionary(Of String, String)
Public file2 As New Dictionary(Of String, String)
Public text1 As String()
Public i As Integer
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    'Declare two dictionaries. The key for each will be the text from the input line up to,
    'but not including the first ",". The valus for each will be the entire input line.

    'Dim file1 As New Dictionary(Of String, String)
    'Dim file2 As New Dictionary(Of String, String)
    'Dim text1 As String()
    For Each line As String In System.IO.File.ReadAllLines(TEST1)
        Dim part() As String = line.Split(",")
        file1.Add(part(0), line)

    Next

    For Each line As String In System.IO.File.ReadAllLines(TEST2)
        Dim part() As String = line.Split(",")
        file2.Add(part(0), line)
    Next

    ' AddText("The following lines from " & TEST2 & " are also in " & TEST1)

    For Each key As String In file1.Keys

        If file2.ContainsKey(key) Then
            TextBox1.Text &= (file1(key)) & vbCrLf
            MsgBox(file2(key))
            Label1.Text = file1(key)
        Else
            TextBox2.Text &= (file1(key)) & vbCrLf
        End If
    Next
    text1 = TextBox1.Lines
    IO.File.WriteAllLines("D:\Same.txt", text1)
    text1 = TextBox2.Lines
    IO.File.WriteAllLines("D:\Differrent.txt", text1)

End Sub

Upvotes: 1

Views: 765

Answers (1)

Steve
Steve

Reputation: 216358

The first thing I would change is the use of a Dictionary. I would use an Hashset. See HashSet versus Dictionary

Then I would change the ReadAllLines loop. The ReadAllLines loads every line in memory before starting the loop, while ReadLines doesn't read all lines but you can start to work on your line immediately.
See What's the fastest way to read a text file line-by-line?

The third point is switching the order of the files read. First read the TEST2 file then the TEST1. This because while you load TEST1 lines you could immediately check if the file2 Hashset contains the key and Add the found line in a list of found strings while the line not found in a list of not found strings.

Dim TEST1 = "D:\temp\test3.txt"
Dim TEST2 = "D:\temp\test6.txt"
Dim file2Keys As New Hashset(Of String)

For Each line As String In System.IO.File.ReadLines(TEST2)
    Dim parts = line.Split(",")
    file2Keys.Add(parts(0))
Next

Dim listFound As New List(Of String)()
Dim listNFound= New List(Of String)()

For Each line As String In System.IO.File.ReadLines(TEST1)
    Dim parts = line.Split(",")
    If file2Keys.Contains(parts(0)) Then
        listFound.Add(line)
    Else
        listNFound.Add(line)
    End If
Next
IO.File.WriteAllText("D:\temp\Same.txt", String.Join(Environment.NewLine, listFound.ToArray()))
IO.File.WriteAllText("D:\temp\Differrent.txt", String.Join(Environment.NewLine, listNFound.ToArray()))

Upvotes: 2

Related Questions