AidanO
AidanO

Reputation: 878

Looking for a fast method to get a list of distinct values from a large database table

I have a table (lets call it 'actions') that lists all actions completed by a bunch of different clients.

I want to get a list of distinct action types for a specific client. I also know that there are 100 action types in the system

I've tried the obvious:

SELECT DISTINCT actionType FROM actions WHERE clientId = @clientId

But it's very slow due to the large number of records in the table.

I'm now working on a query that uses exists to get a list of actionTypes that I suspect will be faster:

SELECT 'actionType1' WHERE EXISTS (SELECT TOP 1 * FROM actions WHERE 
   clientId = @clientID AND actionType = 'actionType1' ) 
UNION 
SELECT 'actionType2' WHERE EXISTS (SELECT TOP 1 * FROM actions WHERE 
   clientId = @clientID AND actionType = 'actionType2' ) 
UNION
.
.
.
UNION
SELECT 'actionType100' WHERE EXISTS (SELECT TOP 1 * FROM actions WHERE 
       clientId = @clientID AND actionType = 'actionType100' ) 

While I expect this to be faster (and hopefully fast enough) it feels very hacky to me Can someone suggest a better way to do this?

Cheers, A

Update: The select distinct was taking 3 seconds yesterday, its now running instantly! I wonder if someone has added an index. I will check

As suspected by a few people, the second query is actually much slower than the distinct, it is taking 4 seconds to run!

Upvotes: 0

Views: 2232

Answers (3)

S.Karras
S.Karras

Reputation: 1493

Couple of design remarks here. If I understand correctly the actions are strings persisted in the actions table.

1) If you have a predefined set of types of actions my first recommendation would be to remove the strings and replace them with integers as action type ids, which would be a lookup integer value to the actual name of the action type in a separate action type table. Then, create a foreign key and a non-clustered index in the action type id (the index could be filtered in the non-null action types if you are interested mostly in those). That way you are lowering record size, clustered index size and overall performance, when filtering with action types.

2) If you cannot change the db design or the action types are too broad for a lookup table, you should create a non-clustered index in client_id with action_type as included column (this too could be filtered to non-null action type values to narrow the index).

Upvotes: 1

Caius Jard
Caius Jard

Reputation: 74605

How long does SELECT actionType FROM actions WHERE clientId = @clientId take to run? I susepct it's the major bottleneck, not the DISTINCT operation

Look at the query plan to be sure, but if clientID is not indexed, then it might well be a full table scan to pull the data required.

End of the day, a database engine is going to do a hashtable to get a distinct, (same as if you SELECT x FROM y GROUP BY x) and that's a fast op that needs access to all the data. YOu aren't going to make it any quicker decomposing it in any way at all; 99% of the speed will be down to the time taken to pull the data in; adding elements to a hashtable as the row reader is traversing the table is trivial

Upvotes: 1

Esteban P.
Esteban P.

Reputation: 2809

if you create a Clustered Index on actionType it should greatly improve the performance.

Upvotes: -1

Related Questions