Christian
Christian

Reputation: 3972

Select rows from one DataTable not in another

I'm trying to get a list of rows in DataTableA where the value in Column 1 is not in Column1 of DataTableB.

I'm using the following LinQ query

//Not in Database
var query = from i in dtImport.AsEnumerable()
            where !dtProducts.AsEnumerable().Any(p => p[colP] == i[colI])
            select i;

Such that I want a list of products in the import table that aren't already in the products table.

Which seems to skip past the line quickly when I'm debugging but then when I call anything relating to that query such as int rows = query.Count<DataRow>(); or DataTable dtResult = query.CopyToDataTable(); it seems to take a long time so I just stop the program.

So, What am I doing wrong?

Upvotes: 2

Views: 9914

Answers (3)

Olivier Jacot-Descombes
Olivier Jacot-Descombes

Reputation: 112352

Your query is slow, because it has to enumerate the products for each record in dtImport. Put the products into a dictionary first, to speed up your query.

var prod = dtProducts.AsEnumerable().ToDictionary(p => p[colP]);  
var query = from imp in dtImport.AsEnumerable()  
            where !prod.ContainsKey(imp[colI])  
            select imp;  

Upvotes: 1

Magnus
Magnus

Reputation: 46929

Linq uses deferred execution. The query is executed when it is used (not when declared)
For better performance you can use a HashSet like the following;

var set = new HashSet<int>(dtProducts.AsEnumerable().Select(p => p.colP));
var result = dtImport.AsEnumerable().Where(i => !set.Contains(i[colI])).ToList();

Upvotes: 4

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726579

The slowdown is expected: the query does not get evaluated until you enumerate the results, so you skip this line in the debugger pretty quickly: all it does is preparing to query the data source; the querying is done on enumerating the results.

As far as I can tell without profiling your code, the issue is probably related to a big out-of-db select that happens when you convert dtProducts and dtImport to IEnumerable: essentially, you bring the data from both tables into memory before doing your select. If your tables are of considerable size, this is probably where most of the time goes. But again, the only sure way to tell is profiling.

Upvotes: 3

Related Questions