Boris Callens
Boris Callens

Reputation: 93417

Is this execution plan a motivation for re thinking my primary keys

When I entered my current (employer's) company a new database schema was designed and will be the base of a lot of the future tools that are/will be created. With my limited SQL knowledge I think the table is rather well designed. My only concern is that almost every table has a multy-part primary key. Every table has at least a CustomerId and key of it's own. While these are indeed defining for a certain record, I have the feeling that multiple keys (we're talking quadruple here) are very inefficient.

Today I was seeing some unimaginable CPU usage over a simple, repeated query that joins two tables, selects a single string field from the first and distincts them.

select distinct(f.FIELDNAME) as fieldName
from foo f
inner join bar b
   on f.id = b.fId
where b.cId = @id;

Checking the execution plan (I'm no EP Hero) I noticed that there are three major CPU points. The distinct (as expected) and two seeks over the indeces. I would personally think that the indices seek should be extremely fast, but they take up 18% of the cost each. Is this normal? Is it due to the (quadruple) clustered indexes?

--UPDATE--
The query is used for creating a Lucene index. It's a one-time processing that happens about weekly (sounds contradictive, I know). I can't re-use any results here as far as I see.

Upvotes: 1

Views: 170

Answers (3)

Jeff Ferland
Jeff Ferland

Reputation: 18322

In most databases, indexes aren't used if the first column in the index isn't listed. You say that the customerId is part of every primary key, but you don't use it for the join in your query. To properly answer your question, we really need to see the create table output for foo and bar, or at least show index from.

That said, your query may be faster if you change it like so:

select distinct(f.FIELDNAME) as fieldName
from foo f
inner join bar b
   on f.id = b.fId
   and f.cId = b.cId #Using this part of the key will speed it up
where b.cId = @id;

My comment assumes that your primary key is ordered as "cId, fId" Effectively, that will mean that your query doesn't have to check every cId, only the ones that are part of the index.

Upvotes: 0

Quassnoi
Quassnoi

Reputation: 425843

Could you please run the following queries and post their output:

SELECT  COUNT(*), COUNT(DISTINCT fieldname)
FROM    foo

SELECT  COUNT(*), COUNT(DISTINCT cId), COUNT(DISTINCT fId)
FROM    bar

This will help to estimate which indexes best suit your needs.

Meanwhile make sure you have the following indexes:

foo (FIELDNAME)
bar (cId, fId)

and rewrite your query:

SELECT  DISTINCT(fieldname)
FROM    foo f
WHERE   EXISTS (
        SELECT  1
        FROM    bar b
        WHERE   b.fId = f.id
                AND b.cId = @id
        )

This query should use an index on f.FIELDNAME to build the DISTINCT list and the index on bar to filter out the non-existent values.

Upvotes: 3

Sam Saffron
Sam Saffron

Reputation: 131202

This kind of query looks familiar. Im guessing here, but, it's probably populating a combo box on a web/winform ui that is being hit pretty hard.

Perhaps you should be caching the results on the application side so you don't end up executing it so often. Worse case scenario you could cache this on sql servers side, but its a massive kludge.

Upvotes: 1

Related Questions