Reputation: 2828
I have two datatables A and B with approx. 50000 rows in each. I am comparing some of the columns (not complete raw) of table B to the datatable A, using two foreach loops. The comparision takes 2~3min to complete. Is there a way to improve performance?
foreach (DataRow entry in B.Rows)
{
// Compare results
foreach (DataRow dr in A.Rows)
{
if (entry[1].ToString().ToUpper()== dr[2].ToString().ToUpper() && entry[2].ToString().ToUpper()== dr[3].ToString().ToUpper())
{
// do stuff....
entry[1]== dr[2];
entry[2]== dr[3];
break;
}
}
}
Upvotes: 1
Views: 1475
Reputation: 460158
You can use Linq-To-DataSet
, Enumerable.Join
is much more efficient since it's using a Dictionary
/Lookup
under the hood.
var join = from rowA in A.AsEnumerable()
join rowB in B.AsEnumerable()
on new { col1 = rowA.Field<string>(2), col2 = rowA.Field<string>(3) }
equals new { col1 = rowB.Field<string>(1), col2 = rowB.Field<string>(2) }
select new { rowA, rowB };
foreach (var bothRows in join)
{
DataRow rowA = bothRows.rowA;
DataRow rowB = bothRows.rowB;
// do stuff....
}
Why is LINQ JOIN so much faster than linking with WHERE?
Update, acc. your comment:
I need to find and mark in the B table rows which are not in the A table. So basically on your proposed solution I would like to get join which will have like (entry1 != dr[2] || entry[2] != dr[3]))
Then you could use a combination of Enumerable.Except
and Join
:
var bKeyCols = B.AsEnumerable()
.Select(r => new { col1 = r.Field<string>(1), col2 = r.Field<string>(2) });
var aKeyCols = A.AsEnumerable()
.Select(r => new { col1 = r.Field<string>(2), col2 = r.Field<string>(3) });
var bNotInA = bKeyCols.Except(aKeyCols);
var bRowsNotInA = from row in B.AsEnumerable()
join keyCols in bNotInA
on new { col1 = row.Field<string>(1), col2 = row.Field<string>(2) } equals keyCols
select row;
Enumerable.Except
also uses a set which makes it very efficient at the cost of some memory.
Upvotes: 3
Reputation: 69260
What you are doing has O(n^2) complexity, which is bad performance wise. If it is bad now, it will be even worse when you add more lines to the tables.
The algorithm would be considerably faster if you could sort both the tables before. In that case you could loop through both the tables on the same time.
Another more simple option (but not as fast) is to convert the first of the tables to a dictionary instead, to allow random access to it when looping through the second table.
It should be possible to do this with LINQ instead, that way you won't have to do the heavy algorithm implementation yourself.
var matching = from br in B.Rows
join ar in A.Rows on ar[1] equals br[2]
where ar[2] == br[3]
select new { ar, br };
foreach (var match in matching)
{
// Do stuff.
}
Upvotes: 3
Reputation: 62256
It's hard to say where is a real problem, but
you may try to use DataTable
built-in query possibility.
Pseudocode:
DataView dv = new DataView(A);
foreach (DataRow entry in B.Rows)
{
dv.RowFilter = "COLUMN_A1 = " + entry[1] + " AND COLUMN_A2=" + entry[2];
foreach (DataRowView drv in dv) {
//DO SOMETHING
}
}
where COLUMN_A1
and COLUMN_A2
are respecive column names from A
table
Qr simply
Consider that in this case you will have increased memory pressure.
Upvotes: 2