Reputation: 1273
I wish to import many files into a database (with custom business logic preventing simple SSIS package use).
High level description of problem:
Problem with my approach: Each row must be checked for duplicates, I thought a call to remote server to leverage SQL would be too slow, so I opted for LINQ. The query was simple, but the size of the dataset causes it to crawl (90% execution time spent in this query checking the fields).
var existingRows = from row in recordDataTable.AsEnumerable()
where row.Field<int>("Entry") == entry
&& row.Field<string>("Device") == dev
select row;
bool update = existingRows.Count() > 0;
What other ways might there be to more efficiently check for duplicates?
Upvotes: 1
Views: 1517
Reputation: 2804
Using linq it will basically do a for loop over your ~1M records every time you check for a duplicate.
You would be better off putting the data into a dictionary so your lookups are against an in memory index.
Upvotes: 1