Dilip Kr Singh
Dilip Kr Singh

Reputation: 1550

What is a columnstore index and how is different from clustered and non-clustered indexes?

I am confused about the columnstore index.

What is a columnstore index, and how it is different from clustered and non-clustered indexes?

Upvotes: 20

Views: 29954

Answers (3)

TheGameiswar
TheGameiswar

Reputation: 28890

Assume you have a table like below with col1 as primary key:

col1 (PK) col2 col3
1 2 3
4 5 6

Normal indexes will be stored "row-wise" (per-row), so all the columns of a single given row reside on a single page (assuming a page can hold only one row):

Page col1 col2 col3
page1 1 2 3
page2 4 5 6

So when you want to read something across multiple rows, e.g. with SUM(col3), SQL Server will need to read both page 1 and page 2 in order to deliver, which is a total "cost" of two pages read in this example.

With columnstore indexes, the same data will be stored "column-wise", in per-column pages instead:

Page row1 row2
page1 1 4
page2 2 5
page3 3 6

This means that if you want to do that same SUM(col3) from before, SQL Server now has to read just one page (page 3), rather than two pages as before.

Columnstore indexes provide a clear benefit for column-centric queries because they allow the processing engine to make fewer reads of data that it doesn't need, using less memory and I/O.

As a result, in large tables where such queries are run frequently, the performance benefit of an appropriate columnstore index can be enormous (up to 10x performance and data compression gains according to the Microsoft SQL docs).

Upvotes: 48

Nguyen Phan Tan
Nguyen Phan Tan

Reputation: 419

Columnstore index is very well explained here: http://www.patrickkeisler.com/2014/04/what-is-non-clustered-columnstore-index.html

The traditional clustered and non-clustered index you mentioned are both rowstore index, where the database stores the index row by row. The index will spread over several partitions, so even when we select only one column, the database still have to scan over all partitions to get the data, hence making a lot of I/O's.

Columnstore index, on the other hand, stores the index column by column. Normally, this will have all data of a column stored within one partition since all the data of one column combined is not that large. Now, when we select 1 column from the index, the database can return the data from one partition, which reduces a lot of I/O's. Moreover, columnstore index often has a significant compression ratio, therefore the I/O is even more efficient and the whole index can be saved in memory, which helps make queries 10x to 100x faster.

Columnstore index does not always perform better than rowstore. Columnstore index is suitable for scenarios like data warehousing and BI, where the data are often processed in mass, such as for aggregations. However, it performs worse than rowstore index in scenarios where data are often searched individual rows.

One thing worth noticing is that non-clustered columnstore index locks your table from being changed (but there are some work-around solutions to change the data), while clustered columnstore index still allows you to edit the data without dropping or disabling the index.

For more information on this topic, please refer to the article above and also try reading the MSDN documents.

Upvotes: 12

Kannan Kandasamy
Kannan Kandasamy

Reputation: 13959

Clustered Columnstore Index is a new feature in SQL Server 2014. Columnstore index allows data to be stored in a columnar format instead of traditional row-based storage. Column store indexes (Non-clustered) were originally introduced in SQL 2012 to address high query performance under high volume requirements typical of Data warehousing/Reporting.

Major points:

  1. It stores data in columnar data structure which aids reads very faster. Stores data in compressed format and hence your total IO cost will be very minimal.
  2. Columnstore data structure is the same data structure where data and indexes everything stored, unlike data stored separately and indexes stored separately etc.,
  3. It will be very useful for more columns table where you select only limited columns daily, for example if there is ProductSalesFact table, you normally select for this product what is the count of sales, or for this quarter what is the sales etc, Eventhough it has hundreds of columns it access only two required columns.

My blog on columnstore index which provide performance study of 300 million records with columnstore vs rowstore

https://sqlserver101.wordpress.com/2016/01/25/why-clustered-columnstore-index-in-sql-server-2014/

MSDN link for various versions of columnstore and paths:

https://msdn.microsoft.com/en-us/library/dn934994.aspx

Upvotes: 6

Related Questions