Shevek
Shevek

Reputation: 4063

VB.Net - Efficient way of de-duplicating data

I am dealing with a legacy application which is written in VB.Net 2.0 against a SQL 2000 database.

There is a single table which has ~125,000 rows and 2 pairs of fields with similar data.

i.e. FieldA1, FieldB1, FieldA2, FieldB2

I need to process a combined, distinct list of FieldA, FieldB.

Using SQL I have confirmed that there are ~140,000 distinct rows.

Due to a very restrictive framework in the application I can only retrieve the data as either 2 XML objects, 2 DataTable objects or 2 DataTableReader objects. I am unable to execute custom SQL using the framework.

Due to a very restrictive DB access policy I am unable to add a View or Stored Proc to retrieve as a single list.

What is the most efficient way to combine the 2 XML / DataTable / DataTableReader objects into a single, distinct, IEnumerable object for later processing?

Upvotes: 0

Views: 548

Answers (2)

Shevek
Shevek

Reputation: 4063

This is the solution I came up with.

Combine the 2 DataTables using .Merge (thanks to Matt's answer)

Using this as a base I came up with the following code to get distinct rows from the DataTable based on 2 columns:

Private Shared Function GetDistinctRows(sourceTable As DataTable, ParamArray columnNames As String()) As DataTable

    Dim dt As New DataTable
    Dim sort = String.Empty

    For Each columnName As String In columnNames

        dt.Columns.Add(columnName, sourceTable.Columns(columnName).DataType)

        If sort.Length > 0 Then
            sort = sort & ","
        End If

        sort = sort & columnName

    Next

    Dim lastValue As DataRow = Nothing

    For Each dr As DataRow In sourceTable.Select("", sort)

        Dim add As Boolean = False

        If IsNothing(lastValue) Then
            add = True
        Else
            For Each columnName As String In columnNames
                If Not (lastValue(columnName).Equals(dr(columnName))) Then
                    add = True
                    Exit For
                End If
            Next
        End If

        If add Then
            lastValue = dr
            dt.ImportRow(dr)
        End If

    Next

    Return dt

End Function

Upvotes: 0

Matt Wilko
Matt Wilko

Reputation: 27342

I may have missed something here but could you not combine both DataTables using Merge?

DataTableA.Merge(DataTableB)

You can then use DataTableA.AsEnumerable()

Then see this answer on how to remove duplicates or

You can do this with a DataView as follows: dt.DefaultView.ToTable(True,[Column names])

Upvotes: 1

Related Questions