dspiegs
dspiegs

Reputation: 568

How to efficiently compare two data tables in C#

I am working on a method that takes in two datatables and a list of primary key column names and gives back the matches. I do not have any other info about the tables.

I have searched the site for a solution to this problem and have found some answers, but none have given me a fast enough solution.

Based on results from stackoverflow I now have this:

var matches =
    (from rowA in tableA.AsEnumerable()
     from rowB in tableB.AsEnumerable()
     where primaryKeyColumnNames.All(column => rowA[column].ToString() == rowB[column].ToString())
     select new { rowA, rowB });

The problem is this is REALLY slow. It takes 4 minutes for two tables of 8000 rows each. Before I came to stackoverflow I was actually iterating through the columns and rows it took 2 minutes. (so this is actually slower than what I had) 2-4 minutes doesn't seem so bad until I hit the table with 350,000 rows. It takes days. I need to find a better solution.

Can anyone think of a way for this be faster?

Edit: Per a suggestion from tinstaafl this is now my code.

var matches = tableA.Rows.Cast<DataRow>().Select(rowA => new 
{
    rowA,
    rowB = tableB.Rows.Find(rowA.ItemArray.Where((x, y) => 
        primaryKeyColumnNames.Contains(tableA.Columns[y].ColumnName,
               StringComparer.InvariantCultureIgnoreCase)).ToArray())
})
.Where(x => x.rowB != null);    

Upvotes: 2

Views: 12905

Answers (1)

tinstaafl
tinstaafl

Reputation: 6948

Using the PrimaryKey property of the DataTable, which will accept an array of columns, should help. Perhaps something like this:

tableA.PrimaryKey = primaryKeyColumnNames.Select(x => tableA.Columns[x]).ToArray();
tableB.PrimaryKey = primaryKeyColumnNames.Select(x => tableB.Columns[x]).ToArray();
var matches = (from System.Data.DataRow RowA in tableA.Rows
                where tableB.Rows.Contains(RowA.ItemArray.Where((x,y) => primaryKeyColumnNames.Contains(tableA.Columns[y].ColumnName)).ToArray())
                select RowA).ToList();

In a test with 2 tables with 9900 rows and returning 9800 as common, this took about 1/3 of a second.

Upvotes: 3

Related Questions