Reputation: 568
I am working on a method that takes in two datatables and a list of primary key column names and gives back the matches. I do not have any other info about the tables.
I have searched the site for a solution to this problem and have found some answers, but none have given me a fast enough solution.
Based on results from stackoverflow I now have this:
var matches =
(from rowA in tableA.AsEnumerable()
from rowB in tableB.AsEnumerable()
where primaryKeyColumnNames.All(column => rowA[column].ToString() == rowB[column].ToString())
select new { rowA, rowB });
The problem is this is REALLY slow. It takes 4 minutes for two tables of 8000 rows each. Before I came to stackoverflow I was actually iterating through the columns and rows it took 2 minutes. (so this is actually slower than what I had) 2-4 minutes doesn't seem so bad until I hit the table with 350,000 rows. It takes days. I need to find a better solution.
Can anyone think of a way for this be faster?
Edit: Per a suggestion from tinstaafl this is now my code.
var matches = tableA.Rows.Cast<DataRow>().Select(rowA => new
{
rowA,
rowB = tableB.Rows.Find(rowA.ItemArray.Where((x, y) =>
primaryKeyColumnNames.Contains(tableA.Columns[y].ColumnName,
StringComparer.InvariantCultureIgnoreCase)).ToArray())
})
.Where(x => x.rowB != null);
Upvotes: 2
Views: 12905
Reputation: 6948
Using the PrimaryKey
property of the DataTable
, which will accept an array of columns, should help. Perhaps something like this:
tableA.PrimaryKey = primaryKeyColumnNames.Select(x => tableA.Columns[x]).ToArray();
tableB.PrimaryKey = primaryKeyColumnNames.Select(x => tableB.Columns[x]).ToArray();
var matches = (from System.Data.DataRow RowA in tableA.Rows
where tableB.Rows.Contains(RowA.ItemArray.Where((x,y) => primaryKeyColumnNames.Contains(tableA.Columns[y].ColumnName)).ToArray())
select RowA).ToList();
In a test with 2 tables with 9900 rows and returning 9800 as common, this took about 1/3 of a second.
Upvotes: 3