Finding Duplicates by group in CSV

I'm trying to find duplicated records in CSV file. Let me explain what I am trying to do:

Records

Plate 1 has 2 A800 and 2 B500.

Plate 2 has 2 A800 and 2 B500 it's same as Plate 1. I want to detect this situation.

Plate 3 has 1 A800 and 1 B500.

I tried with 2 nested For Each Loop and stored them inside of 2 dimensional array but if numbers is high this approach is very slow.

What is your advice for this situation?

Upvotes: 0

Views: 127

Answers (1)

Robert Richter
Robert Richter

Reputation: 278

Here's an example of using GroupBy. In your GroupBy, be certain use "Key" before every field. If you use

 .GroupBy(Function(F) New With {.Plate = F.Plate,
                                .Part = F.Part}) _

the grouping will NOT work.

   Private Class cPlate_Part
      Public Plate As Integer
      Public Part As String
      Sub New(Plate As Integer, Part As String)
         Me.Plate = Plate
         Me.Part = Part
      End Sub
   End Class

   Private Plate_Parts() As cPlate_Part =
      New cPlate_Part() {New cPlate_Part(1, "A800"),
                         New cPlate_Part(1, "A800"),
                         New cPlate_Part(1, "B500"),
                         New cPlate_Part(1, "B500"),
                         New cPlate_Part(2, "B500"),
                         New cPlate_Part(1, "A800"),
                         New cPlate_Part(2, "A800"),
                         New cPlate_Part(2, "A800"),
                         New cPlate_Part(3, "A800")}

   Private FindDuplicates()
      Dim GroupedParts = Plate_Parts _
                       .GroupBy(Function(F) New With {Key .Plate = F.Plate,
                                                      Key .Part = F.Part}) _
                       .ToList ' If you don't want it re-evaluated every time you use it

      Dim FindDuplicates = GroupedParts _
                         .Where(Function(F) F.Count > 1) _
                         .Select(Function(F) F(0))

      Dim DistinctParts = GroupedParts _
                        .Select(Function(F) F(0))

      For Each P In DistinctParts
         Debug.Print($"Plate={P.Plate}  Part={P.Part}")
      Next
   End Sub

The way GroupBy works is if you have a simple type (such as integer or string), you can simply group by that type. If it is more complex, create a New With and be sure to use keys as I have shown. The result is a iEnumerable of iGrouping, which acts mostly like an iEnumerable, but has a "Key" method.

For example

for each GroupOfParts in GroupedParts
   dim Key=GroupOfParts.Key
   Dim Group=GroupOfParts.ToList ' Usually Unnecessary
   ' Some code
next

EDIT:

Here is an alternate group by using alternate syntax

 Dim GroupedParts_AlternateSyntax = From Parts In Plate_Parts
                                         Group By Parts.Part, parts.plate Into Group

There is a subtle difference in how it works. You need to use ".Group" method for each item to get the group, or an explicit select

Dim GroupedParts_AlternateSyntax = From Parts In Plate_Parts
                                         Group By Parts.Part, parts.plate Into Group
                                         Select Group

Upvotes: 1

Related Questions