Reputation: 878
I have a table (lets call it 'actions') that lists all actions completed by a bunch of different clients.
I want to get a list of distinct action types for a specific client. I also know that there are 100 action types in the system
I've tried the obvious:
SELECT DISTINCT actionType FROM actions WHERE clientId = @clientId
But it's very slow due to the large number of records in the table.
I'm now working on a query that uses exists to get a list of actionTypes that I suspect will be faster:
SELECT 'actionType1' WHERE EXISTS (SELECT TOP 1 * FROM actions WHERE
clientId = @clientID AND actionType = 'actionType1' )
UNION
SELECT 'actionType2' WHERE EXISTS (SELECT TOP 1 * FROM actions WHERE
clientId = @clientID AND actionType = 'actionType2' )
UNION
.
.
.
UNION
SELECT 'actionType100' WHERE EXISTS (SELECT TOP 1 * FROM actions WHERE
clientId = @clientID AND actionType = 'actionType100' )
While I expect this to be faster (and hopefully fast enough) it feels very hacky to me Can someone suggest a better way to do this?
Cheers, A
Update: The select distinct was taking 3 seconds yesterday, its now running instantly! I wonder if someone has added an index. I will check
As suspected by a few people, the second query is actually much slower than the distinct, it is taking 4 seconds to run!
Upvotes: 0
Views: 2232
Reputation: 1493
Couple of design remarks here. If I understand correctly the actions are strings persisted in the actions table.
1) If you have a predefined set of types of actions my first recommendation would be to remove the strings and replace them with integers as action type ids, which would be a lookup integer value to the actual name of the action type in a separate action type table. Then, create a foreign key and a non-clustered index in the action type id (the index could be filtered in the non-null action types if you are interested mostly in those). That way you are lowering record size, clustered index size and overall performance, when filtering with action types.
2) If you cannot change the db design or the action types are too broad for a lookup table, you should create a non-clustered index in client_id with action_type as included column (this too could be filtered to non-null action type values to narrow the index).
Upvotes: 1
Reputation: 74605
How long does SELECT actionType FROM actions WHERE clientId = @clientId
take to run? I susepct it's the major bottleneck, not the DISTINCT operation
Look at the query plan to be sure, but if clientID is not indexed, then it might well be a full table scan to pull the data required.
End of the day, a database engine is going to do a hashtable to get a distinct, (same as if you SELECT x FROM y GROUP BY x) and that's a fast op that needs access to all the data. YOu aren't going to make it any quicker decomposing it in any way at all; 99% of the speed will be down to the time taken to pull the data in; adding elements to a hashtable as the row reader is traversing the table is trivial
Upvotes: 1
Reputation: 2809
if you create a Clustered Index on actionType it should greatly improve the performance.
Upvotes: -1