KVM
KVM

Reputation: 919

clustered columnstore index is slow

I have a quite big table with 400 columns and 100,000 rows. When I select all columns, the query is very slow (~7500ms). There is only a primary key set on 3 columns. I don't care about the speed of insert/update, this table is mostly for reads. I was reading about how Clustered Columnstore Index is an excellent match for my requirements, to improve read performance.

So I tried with the clustered columnstore index and the speed is almost the same (~7000ms). I really expected a much higher improvement. Am I missing something ?

enter image description here

Upvotes: 0

Views: 4425

Answers (4)

user1664043
user1664043

Reputation: 705

Found my way to this post with a tangentially related issue.

I have a large crosswalk table (500M+ rows) that has only a few columns - a 9-character numeric string id, and internal int id, and a status flag.

To me this seemed like a great use case for columnstore. I created a clustered columnstore index, and saved 87.5% of the storage space (yay!)

but then I happened to do a

SELECT TOP 1000 <numeric string id>
FROM crosswalk
ORDER BY <numeric string id> asc

and it took 30 seconds. Leaving off the ORDER BY they come out fast but in a pretty random order. Doing the same query on a copy of the table where it's the primary key comes out very fast.

I was very surprised doing a single-column query on a columnstore table was so slow.

I guess PK being literally sorted that way in the storage really makes ordered grabs fast while btrees add overhead. Just wasn't expecting that much overhead

Upvotes: 0

Adrian
Adrian

Reputation: 3438

I just had a similar problem, we currently have a table with 176 million rows. I'm not a db admin at all and found out the hard way that a column store index isn't a silver bullet. As you observed, the more columns you select the slower it gets.

One way I solved this was using the column store index in my WHERE clause and retrieve the PK of the row I needed. Then it was a matter of running a SELECT * with the PK which results in a simple clustered index seek on the PK.

This might be something to try out.

Upvotes: 0

Cyndi Baker
Cyndi Baker

Reputation: 679

You are going to see the real performance gains when selecting only some of the 400 columns. In traditional rowstore all columns always have to be accessed when you select that row, even if you were only selecting a few columns. With columnstore, if you only selected, say, 100 of your 400 columns, the query should be roughly four times faster with 25% of the logical reads. With select * you will not see much of an improvement.

Upvotes: 5

Ben Link
Ben Link

Reputation: 107

To answer this question, I need to know your query and the columns definition of the fields in the where clause.

It is very important that these are of the same datatype to be sure that the index is used. Sometimes conversion has to be done (timestamp to date or char to date etc.) which makes it impossible to use the index.

Upvotes: -1

Related Questions