Reputation: 13
Hi All below code is how to compare contents in two text file and is work fine for record in files, but my issue when files have a lot line ( 80000 up) my code work very very slow and i cannot accept it. please kindly give me some idea
Public Class Form1
Const TEST1 = "D:\a.txt"
Const TEST2 = "D:\b.txt"
Public file1 As New Dictionary(Of String, String)
Public file2 As New Dictionary(Of String, String)
Public text1 As String()
Public i As Integer
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
'Declare two dictionaries. The key for each will be the text from the input line up to,
'but not including the first ",". The valus for each will be the entire input line.
'Dim file1 As New Dictionary(Of String, String)
'Dim file2 As New Dictionary(Of String, String)
'Dim text1 As String()
For Each line As String In System.IO.File.ReadAllLines(TEST1)
Dim part() As String = line.Split(",")
file1.Add(part(0), line)
Next
For Each line As String In System.IO.File.ReadAllLines(TEST2)
Dim part() As String = line.Split(",")
file2.Add(part(0), line)
Next
' AddText("The following lines from " & TEST2 & " are also in " & TEST1)
For Each key As String In file1.Keys
If file2.ContainsKey(key) Then
TextBox1.Text &= (file1(key)) & vbCrLf
MsgBox(file2(key))
Label1.Text = file1(key)
Else
TextBox2.Text &= (file1(key)) & vbCrLf
End If
Next
text1 = TextBox1.Lines
IO.File.WriteAllLines("D:\Same.txt", text1)
text1 = TextBox2.Lines
IO.File.WriteAllLines("D:\Differrent.txt", text1)
End Sub
Upvotes: 1
Views: 765
Reputation: 216358
The first thing I would change is the use of a Dictionary. I would use an Hashset. See HashSet versus Dictionary
Then I would change the ReadAllLines loop. The ReadAllLines loads every line in memory before starting the loop, while ReadLines doesn't read all lines but you can start to work on your line immediately.
See What's the fastest way to read a text file line-by-line?
The third point is switching the order of the files read. First read the TEST2 file then the TEST1. This because while you load TEST1 lines you could immediately check if the file2 Hashset contains the key and Add the found line in a list of found strings while the line not found in a list of not found strings.
Dim TEST1 = "D:\temp\test3.txt"
Dim TEST2 = "D:\temp\test6.txt"
Dim file2Keys As New Hashset(Of String)
For Each line As String In System.IO.File.ReadLines(TEST2)
Dim parts = line.Split(",")
file2Keys.Add(parts(0))
Next
Dim listFound As New List(Of String)()
Dim listNFound= New List(Of String)()
For Each line As String In System.IO.File.ReadLines(TEST1)
Dim parts = line.Split(",")
If file2Keys.Contains(parts(0)) Then
listFound.Add(line)
Else
listNFound.Add(line)
End If
Next
IO.File.WriteAllText("D:\temp\Same.txt", String.Join(Environment.NewLine, listFound.ToArray()))
IO.File.WriteAllText("D:\temp\Differrent.txt", String.Join(Environment.NewLine, listNFound.ToArray()))
Upvotes: 2