Caydenz
Caydenz

Reputation: 41

Faster way to find list duplicates

I have the following structure:

Public Structure asd
    Public Property st As String
    Public Property hash As String
End Structure

and this list: Dim keys As New List(Of asd)

I am using this code to search for duplicates:

   For intOuter As Integer = 0 To keys.Count - 2
        For intInner As Integer = intOuter + 1 To keys.Count - 1
            If keys(intOuter).hash = keys(intInner).hash Then
                TextBox1.Text += keys(intOuter).hash + vbNewLine
                TextBox1.Text += keys(intOuter).st + "-" + keys(intInner).st + vbNewLine
            End If
        Next intInner
    Next intOuter

However, it takes a lot of time (the list has over 100000 elements).

Is there a faster way to find the duplicates (by hash, not by st)(elements with same hash)?

Upvotes: 0

Views: 125

Answers (2)

the_lotus
the_lotus

Reputation: 12748

I did made the test but I would assume a good part of why this is slow is because you write the information in a textbox everytime. Have you tried testing the speed without displaying the information in the textbox? I would assume it would be a lot faster. Try to put the information in a different place (a string builder or a list of diplicate items) and then write to the textbox only once.

When the two process (searching and writting) are being splitted. You can try and optimize one of them.

An other option could be to change your data structure if that is possible to have a sort of Dictionary(Of String, List(Of String)) where each hash have a list of st.

Upvotes: 2

MarcinJuraszek
MarcinJuraszek

Reputation: 125620

You can use LINQ: group by hash and get groups with more than one item in a group.

Dim grouped = keys.GroupBy(Function(x) x.hash)
                  .Where(Function(g) g.Count() > 1)
                  .Select(Function(g) New With { .Hash = g.Key, Items = g.ToList() })
                  .ToList()

Upvotes: 2

Related Questions