Chaim Shtokhamer
Chaim Shtokhamer

Reputation: 55

Efficient search of a phone number between several columns in the table

My table has 4 columns: Customer name, phone 1, phone 2, phone 3.

I need to check if a particular phone number exists in one of the three columns.

My table looks like this: Customerֹ_table

Customer name phone 1 phone 2 phone 3
MOP 123454678 2345556789 346757890
AVI 456735673 3453546354 235634563
DEN 125345678 2345635635 123454678

Here's the query I wrote -

  SELECT DISTINCT COUSTMER_NAME
  FROM Customerֹ_table 
  WHERE NUM_1 OR NUM_2 OR NUM_3 OR NUM_4 =  xxxxxxxxx

Is it right to do the query like that? I am looking for the maximum efficiency possible.

Upvotes: 0

Views: 1041

Answers (4)

Geoff Griswald
Geoff Griswald

Reputation: 1093

Ignore what others have said about redesigning your data structure, this absolutely will not help in SQL Server. Maybe in Oracle or MySQL whatever, but SQL Server is smart enough to use a flat table structure like you have efficiently.

Here are the Top 4 ways to improve the efficiency of your query:

One: Use GROUP BY instead of DISTINCT. DISTINCT is a real performance killer and shouldn't ever be used if you want efficient code. It's a nice shorthand hacky way of getting a unique result, but use GROUP BY for performance.

Two: Use WITH(NOLOCK) to speed up your table read. This can be controversial, but it really shouldn't be. SQL Server 2017 and beyond have lots of nice safety features which make legacy concerns about using NOLOCK for analysis null and void.

Three: Make one of the columns in your data table a PRIMARY KEY. This is important for all kinds of reasons, but in particular performance. SQL Server relies on having a Primary Key so that its automatic performance enhancements and dynamic query optimisation plans can work. In general, the more often you run a query against a table with a Primary Key, the faster SQL will get at delivering you results.

Four: Once you've added a Primary Key, add a separate Nonclustered index for each of your Telephone Number columns. If your table is very large, this would speed up the query a lot. Bear in mind that adding indexes on a very large table takes up a lot of disk space.

Ideally your CUSTOMER_NAME field would be unique, so you could make that your primary key. If it isn't, you'll need to add an ID column as per Yitzhak Khabinsky's answer.

So your query would look like this:

  SELECT COUSTMER_NAME
    FROM Customerֹ_table WITH(NOLOCK)
   WHERE NUM_1 OR NUM_2 OR NUM_3 OR NUM_4 = xxxxxxxxx
GROUP BY COUSTMER_NAME

Assuming the table's Primary Key is on the COUSTMER_NAME column.

For bonus points:

Make sure all columns in your data table are set to NOT NULL. SQL Server will attempt to create dynamic indexes to help speed up your queries, but it won't do so on any columns flagged as nullable. The only exceptions are dates, where you often have no option but to allow for NULLs, so have to take the performance hit.

Use Varchars and not NVarchars. Unless you have some reason to use an NVarchar, like you are capturing extended characters like Chinese, Arabic etc, then don't. They take up twice the space, take longer to scan and take longer to perform logic on such as matching a value in your WHERE clause

Keep your column lengths as short as reasonably possible. So if a Telephone number can only ever be 20 digits long, then set that column to 25 or maybe 30 digits max, just in case you start getting longer phone numbers in the future. If you know your customer ID is only ever going to be 3 digits long, set that field to max 3 digits. Keeping the max size of your data tables trimmed down to the minimum (while leaving enough overhead to reasonably cope with future requirements) really helps with performance.

Upvotes: 1

Stu
Stu

Reputation: 32629

For the best performance, you need to normalise your data - this would allow you to have not just 3 phone numbers but any number - you can add a flag to soft delete, indicate a primary number etc, it's much more manageable than adding N columns in a flat table.

Add a PhoneNumbers table - a bare bones table might be

create table PhoneNumbers (
    customerId int,
    PhoneNumber varchar(20),
    ... any other related data - eg a flag to indicate the primary number etc
)

alter table Phonenumber add constraint PK_PhoneNumbers primary key nonclustered (CustomerId, Phonenumber)
create unique clustered index IdxCustomerId on (Phonenumber,CustomerId)

Then to find a customer with a specific number

select c.CustomerName
from PhoneNumbers n
join Customers c on c.customerId=n.customerId 
where n.Phonenumber='12345678'

Upvotes: 1

Yitzhak Khabinsky
Yitzhak Khabinsky

Reputation: 22275

Please try the following solution.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, customer NVARCHAR(20), phone1 VARCHAR(10), phone2 VARCHAR(10), phone3 VARCHAR(10));
INSERT INTO @tbl (customer, phone1, phone2, phone3) VALUES
(N'MOP', '123454678', '2345556789', '346757890'),
(N'AVI', '456735673', '3453546354', '235634563'),
(N'DEN', '125345678', '2345635635', '123454678');
-- DDL and sample data population, end

DECLARE @phone VARCHAR(10) = '2345635635';

SELECT * 
FROM @tbl
WHERE @phone IN (phone1, phone2, phone3);

Output

+----+----------+-----------+------------+-----------+
| ID | customer |  phone1   |   phone2   |  phone3   |
+----+----------+-----------+------------+-----------+
|  3 | DEN      | 125345678 | 2345635635 | 123454678 |
+----+----------+-----------+------------+-----------+

Upvotes: 1

Nicholas Hunter
Nicholas Hunter

Reputation: 1845

For "maximum efficiency" you need to do what Stu said and redesign your database structure. Meanwhile, here's a query that will work with your existing database structure.

SELECT DISTINCT CUSTOMER_NAME
FROM Customerֹ_table 
WHERE NUM_1 = xxxxxxxxx
OR NUM_2 = xxxxxxxxx
OR NUM_3 = xxxxxxxxx
OR NUM_4 = xxxxxxxxx

Upvotes: 0

Related Questions