Reputation: 2883
For my current project we want to present statistical data and rank it. For my case I'm talking about "Favouriting" of an artist, counting the times an artist's track has been played, displaying a count of how many playlists an artists track has been added to a playlist... These are all very domain specific issues, but it's a concrete example of my issue.
The main issue is that I'm going to be returning result sets that are returned in order for all these statistical attributes.
Here are some examples:
My first thought has determined I need a computed aggregate column. Since I want to order on these values that means a CLUSTERED INDEX would be optimal on each aggregate I want to order by. Secondly, since DML on CLUSTERED INDEX columns can be costly when they are not sequential on insertion I need to make this a scheduled job.
So, for the artist favourite stats, here's the DDL that I have come up with. Noted my T-SQL might be horribly off, but I think the intentions are clear.
CREATE TABLE Stats_ArtistFavourites (
FavouriteCount INT DEFAULT 0,
ArtistId INT PRIMARY KEY NONCLUSTERED,
FOREIGN KEY (ArtistId) REFERENCES Artists
)
CREATED CLUSTERED INDEX IDX_Favourites
ON Stats_ArtistFavourites (FavouriteCount, ArtistId) DESC
So as you can see, I would need to create a separate table for each stat I want to keep track of, otherwise I would have to ORDER BY columns that aren't in the CLUSTERED INDEX. The fact that this seems ugly makes me think I'm going about it all wrong.
Should I start thinking about integrating OLAP (I have very little experience with OLAP cubes)? Or maybe Lucene?
Upvotes: 0
Views: 98
Reputation: 8043
Have you considered using RANK? You may be surprised by the performance.
Upvotes: 0
Reputation: 7678
You might explore indexed views. http://technet.microsoft.com/en-us/library/cc917715.aspx
The first point looks like what you are after.
Upvotes: 0
Reputation: 425441
Scanning by a plain index is akin to a join, as plain indices contain a indexed value along with a reference to a table block in each leaf. To extract a non-indexed value you need to "join" the table by this block reference.
Clustered indexes, on the contrary, contain the table data itself it each leaf, you get the non-indexed field values right on scan.
As long as you select 5 top records, it's OK to use plain index, as one single table is always more simple to manage.
It will be a little slower than a cluster index, as this will imply the "join" desribed above, but it's only 5 records, you will hardly note any difference.
You may even create your statistics table as such:
CREATE TABLE stats (type INTEGER, score INTEGER, artist INTEGER);
CREATE INDEX ix_stats (type, score);
, this will help you to add new aggregated values more easily.
1
for type
here can mean how many time the artist is played
, 2
how many times he is favorited
and so on. When you need new aggregate, you just make a new type and INSERT
5 new rows into the table instead of changing its definition.
Again, if I understand you task, we are talking about selecting dozens records from this table. In this case, manageability is more important than selecting these top 5 artists 10 milliseconds faster.
Upvotes: 2