ankursingh1000
ankursingh1000

Reputation: 1419

Performance impact of columnfamily and Version in Hbase

In Hbase a table which contains 30 column but have a single column family

create 'my_table', { NAME => 'my_family', VERSIONS => 5 }

want to increase the version to 10,000

create 'my_table', { NAME => 'my_family', VERSIONS => 10000 }

when change the version to 10K it will be changed to all columns but can requirement is only to change for 2 column

what will be the performance impact in both cases

  1. make the two different column family and change version accordingly

  2. Changed version for all column

Upvotes: 0

Views: 347

Answers (1)

ankursingh1000
ankursingh1000

Reputation: 1419

it will be good creating separate column family preserving unnecessary version for other 28 column will adversely affect the performance since the size of Hstore file is increased Increased in the size of Hbase data will increase the number of regions that will increase the Number of mappers per region server

so by creating the two column family store file size will not be storing the unnecessary data, help in less split during compaction. IO performance will be improved

if there are two column family A and B and cardinality of A is 1million and B is 1Billion, Data of A is spread across many regions and regions server.This makes mass scans for ColumnFamilyA less efficient.

regions are distributed as per the rowkey, so even if A has 1 million rows and has a good distribution across rowkeys. then yes you may need to scan all those regions. I don't think that will impact much but this can only be avoided by using different table for these two high versioned columns.

Upvotes: 0

Related Questions