Reputation: 41
I have the following structure:
Public Structure asd
Public Property st As String
Public Property hash As String
End Structure
and this list: Dim keys As New List(Of asd)
I am using this code to search for duplicates:
For intOuter As Integer = 0 To keys.Count - 2
For intInner As Integer = intOuter + 1 To keys.Count - 1
If keys(intOuter).hash = keys(intInner).hash Then
TextBox1.Text += keys(intOuter).hash + vbNewLine
TextBox1.Text += keys(intOuter).st + "-" + keys(intInner).st + vbNewLine
End If
Next intInner
Next intOuter
However, it takes a lot of time (the list has over 100000 elements).
Is there a faster way to find the duplicates (by hash, not by st)(elements with same hash)?
Upvotes: 0
Views: 125
Reputation: 12748
I did made the test but I would assume a good part of why this is slow is because you write the information in a textbox everytime. Have you tried testing the speed without displaying the information in the textbox? I would assume it would be a lot faster. Try to put the information in a different place (a string builder or a list of diplicate items) and then write to the textbox only once.
When the two process (searching and writting) are being splitted. You can try and optimize one of them.
An other option could be to change your data structure if that is possible to have a sort of Dictionary(Of String, List(Of String)) where each hash have a list of st.
Upvotes: 2
Reputation: 125620
You can use LINQ: group by hash and get groups with more than one item in a group.
Dim grouped = keys.GroupBy(Function(x) x.hash)
.Where(Function(g) g.Count() > 1)
.Select(Function(g) New With { .Hash = g.Key, Items = g.ToList() })
.ToList()
Upvotes: 2