Bulk insert c# datatable to sql server - Efficient check for duplicate

Question

I wish to import many files into a database (with custom business logic preventing simple SSIS package use).

High level description of problem:

Pull existing sql data into DataTable (~1M rows)
Read excel file into 2d array in one chunk
validate fields row by row (custom logic)
Check for duplicate in existing DataTable
Insert into DataRow
Bulk insert DataTable into SQL table

Problem with my approach: Each row must be checked for duplicates, I thought a call to remote server to leverage SQL would be too slow, so I opted for LINQ. The query was simple, but the size of the dataset causes it to crawl (90% execution time spent in this query checking the fields).

var existingRows = from row in recordDataTable.AsEnumerable()
                                 where row.Field("Entry") == entry
                                 &&    row.Field("Device") == dev
                                 select row;
bool update = existingRows.Count() > 0;

What other ways might there be to more efficiently check for duplicates?

Daniel · Accepted Answer

Using linq it will basically do a for loop over your ~1M records every time you check for a duplicate.

You would be better off putting the data into a dictionary so your lookups are against an in memory index.

Bulk insert c# datatable to sql server - Efficient check for duplicate

Answers (1)

Related Questions