80sRocker
80sRocker

Reputation: 327

VB.NET LINQ Query - List rows rows in table1 where data has changed

I am ultimately trying to compare data from two .csv files looking for "only the data rows in table1 that have changed".

I would like to use a LINQ Query to do this. I am using VB.NET and an OleDbDataAdapter to fill two DataTables with the .csv data.

The number of columns in each table will always match but not necessarily the number of rows. I will not know the column names but I will know the Primary Key column index. i.g. table1.Field(Of String)(4). I kept the column count short in the example but keep in mind the column count in my .csv files will vary and can get be as large as 50+ columns.

table1

"John", "Adams", "51 Orange St", "Mechanic", "ID0004", "45.00", "1987"
"Nancy", "Wilson", "77 Westy Park", "HR", "ID0029", "27.00", "1991"

table2

"John", "Adams", "51 Orange St", "Mechanic", "ID0004", "45.00", "1987"
"Nancy", "Wilson", "227 Groove Ln", "HR", "ID0029", "27.00", "1991"
"Pat", "Rita", "51 Orange St", "Mechanic", "ID0017", "21.00", "1987"

Expected Results:

We have two matches between the two tables on .Field(Of String)(4) which is our Key column. However, in this case, we only want to return one row. "Nancy", "Wilson", "77 Westy Park", "HR", "ID0029", "27.00", "1991" as one of the columns data in that row has changed.

Think of table2 as a master table that will not change. We only care about returning rows in table1 that have a matching key in table2 and only if any data has changed. Thanks!

Upvotes: 0

Views: 834

Answers (2)

80sRocker
80sRocker

Reputation: 327

Here is an alternative solution not using LINQ Query which gives me the expected results. However, I am uncertain regarding it's reliability.

Dim lstTable2 As List(Of String) = File.ReadAllLines("C:\Table2.txt").ToList
Dim lstTable1 As List(Of String) = File.ReadAllLines("C:\Table1.txt").ToList
Dim lstChanges As List(Of String) = New List(Of String)(lstTable1.Except(lstTable2))
File.WriteAllLines("C:\Changes.txt", lstChanges.ToArray())

Upvotes: 0

BlueMonkMN
BlueMonkMN

Reputation: 25601

If you are doing this in memory, and you want this optimized, I would not assume that a LINQ query is going to give you the best results. I would put one of the tables into a dictionary with the primary key pointing to the DataRows so you can quickly look up the matching rows as you loop through the other table. But if you are set on doing this with a LINQ query, here's some sample code:

Sub Main()
  Dim t1 As New DataTable
  Dim t2 As New DataTable

  t1.PrimaryKey = {t1.Columns.Add(), t1.Columns.Add()}
  For c As Integer = 1 To 5 : t1.Columns.Add() : Next
  t2.PrimaryKey = {t2.Columns.Add(), t2.Columns.Add()}
  For c As Integer = 1 To 5 : t2.Columns.Add() : Next
  t1.Rows.Add("John", "Adams", "51 Orange St", "Mechanic", "ID0004", 45.0, 1987)
  t2.Rows.Add("John", "Adams", "51 Orange St", "Mechanic", "ID0004", 45.0, 1987)
  t1.Rows.Add("Nancy", "Wilson", "77 Westy Park", "HR", "ID0029", 27.0, 1991)
  t2.Rows.Add("Nancy", "Wilson", "227 Groove Ln", "HR", "ID0029", 27.0, 1991)
  t2.Rows.Add("Pat", "Rita", "51 Orange St", "Mechanic", "ID0017", 21.0, 1987)

  Dim diffs = _
     From row2 In t2 Group Join row1 In t1 _
     On row1(t1.PrimaryKey(0)) Equals row2(t2.PrimaryKey(0)) _
     And row1(t1.PrimaryKey(1)) Equals row2(t2.PrimaryKey(1)) _
     Into Group
     Where Not Group.Any OrElse RowsDifferent(Group.Single, row2)

  For Each diff In diffs
     If diff.Group.Any Then
        For Each col In diff.row2.ItemArray
           Console.Write(col & ",")
        Next
        Console.WriteLine()
     Else
        For Each col In diff.row2.ItemArray
           Console.Write(col & ",")
        Next
        Console.WriteLine()
     End If
  Next
End Sub

Private Function RowsDifferent(r1 As DataRow, r2 As DataRow) As Boolean
  For i As Integer = 0 To r1.Table.Columns.Count - 1
     If Not r1(i).Equals(r2(i)) Then Return True
  Next
  Return False
End Function

Looks like, thanks to enhancements in .NET Framework 4.0, you could even use a built-in function instead of defining your own RowsDifferent function:

  Dim diffs = _
     From row2 In t2 Group Join row1 In t1 _
     On row1(t1.PrimaryKey(0)) Equals row2(t2.PrimaryKey(0)) _
     And row1(t1.PrimaryKey(1)) Equals row2(t2.PrimaryKey(1)) _
     Into Group Where Not Group.Any OrElse _
        Not Group.Single.ItemArray.SequenceEqual(row2.ItemArray)

Upvotes: 0

Related Questions