Reputation: 23911

How can I speed up a dupe check before insert?

I have written a piece of lab processing software that takes incoming data, breaks it up, and sends it to the proper fields in a database. The DB table is really big and there are a lot of fields. Before I insert the data, I do a dupe check (we get repeat data from multiple sources) before inserting it.

The system works by parsing the incoming data, filling a linq-to-sql object. The dupe check works by performing a .where operation on a list of linq-to-sql objects to selects objects not yet in the table.

For instance ...

input=list (of TableA) 'linq to sql objects
output=input.where(function (x as TableA) not myDb.TableA.any(function(l as table) l.name=x.name, l.dob=x.dob..etc for 10 fields..).tolist

The syntax is a little arcane, but this was the only way I could find to perform an inner join on linq to sql objects and database records. Before I used this method, I went row-by-row performing a dupe check, which was way slower.

As I understand it, Linq is converting this linq statement to a sql statement that is running on the server.

My question is: is there any way to get this to run any faster? is there any reason to expect that writing out a sql statement to do the deduping and running a traditional query would go any faster? This statement is SLOW but it works, is unit tested and blocks the dupes. I am looking for something that is faster and equivalently clean (not hard, I know) or cleaner...

Upvotes: 0

Answers (2)

paparazzo

Reputation: 45096

If no other processes are adding to the SQL table then you could read the table into a HashSet when you start the program. Check the local HashSet. If not in the HashSet then add it to the SQL Table and HashSet. HashSet lookup is about 100 time faster than a SQL query even running on the same physical box. I use this for some large loads.

If you are only going to get dups over a short time period then you could skip loading historic from the table at start or only load the last X. Check that HashSet and only if not found in the HashSet then use a SP to insert or skip. And periodically truncate the HashSet to X.

Upvotes: 0

Farshid Zaker

Reputation: 1990

You can define a unique index on your table base on name, dob and .... Each insert, could be successfully done or raise a unique constraint violation exception. So you won't need any check before insert. I think it's the most straight way of doing it.

Upvotes: 1

How can I speed up a dupe check before insert?

Answers (2)

Related Questions