SNpn
SNpn

Reputation: 2207

Checking Duplicates before inserting into SQL database

So I've been doing some research and I need to write up an INSERT statement to insert unique client names into a table on my server. However the default standard of the database already has thousands of clients in it, and when inserting new clients we need to check if they already exist before attempting to add it to the system.

My question is what would be the best/fastest way to do this? Would it be better to run a simple select query on the clients table (ordered by ASC), and do a binary search or something on the results, or perhaps just do a SQL query similar to the one below?

IF NOT EXISTS (SELECT 1 FROM clients AS c WHERE c.clientname = ?)
BEGIN
  INSERT INTO clients (clientname, address, ...)
  VALUES (?, ?, ...)
END

Is this a slow statement? I may have to run the insert several hundred times per each submission.

Upvotes: 3

Views: 9970

Answers (2)

yankee
yankee

Reputation: 40740

It is not too uncommon to calculate the cost of a SQL in query in terms of disk operations (usually that means reading/writing a block (typically 8 KB) is the unit for your costs). (In Memory-DBs should change something about this line of thought).

If you have hundreds, possible thousands of items and each item is... Say 20 Bytes, then your full database will possibly fit in a single block on disk (400 items/block). Maybe it needs a couple more blocks, but hurray: It is a neglectable small number. With such a small database, your database will probably lounge around in your database' memory cache and you will only need to pay for write access. As your database grows the number of block accesses that you need can be exponentially reduced if you have an index.

Both your solution and Bill's solution will not cause any write access if an item is already present in the database and thus it should both be equally fast.

The interesting part would be:

I may have to run the insert several hundred times per each submission.

That would mean that you might write one and the same block on disk hundreds of times. It would be faster if you could do this in a single step. However, this is indeed a problem as I am not aware of any SQL function that allows this behavior. MySQL's INSERT offers a way to specify a number of values in a single statment. This MIGHT be a considerable plus (I don't know how smart MySQL handles this situation) but it is specific to MySQL and it is not portable.

Another way to speed things up is to not wait until the blocks you have changed are written to disk. This comes at the risk of loosing data without notice, but can be a significant performance boost. Again this is specific to the DBMS that you use. E.g. if you use MySQL with InnoDB you can set the option innodb_flush_log_at_trx_commit=0 in your my.ini to archieve this behaviour.

Would it be better to run a simple select query on the clients table (ordered by ASC), and do a binary search or something on the results

This would needlessly copy large amounts of data from your DBMS to a client (which may possibly be on different machines, communicating over a network protocol). This would still be OK for your small DB, but it does not scale well. It may only be of use if it helps you to save data in a single operation to disk.

Upvotes: 2

Bill Karwin
Bill Karwin

Reputation: 562230

The standard advice is to create a UNIQUE constraint if you want a given column to be unique.

ALTER TABLE clients ADD UNIQUE KEY (clientname);

Then try to do the INSERT, and it'll succeed if there is no matching row, and it'll fail if there is a duplicate. No SELECT is necessary.

Upvotes: 3

Related Questions