Reputation: 1419
In Hbase a table which contains 30 column but have a single column family
create 'my_table', { NAME => 'my_family', VERSIONS => 5 }
want to increase the version to 10,000
create 'my_table', { NAME => 'my_family', VERSIONS => 10000 }
when change the version to 10K it will be changed to all columns but can requirement is only to change for 2 column
what will be the performance impact in both cases
make the two different column family and change version accordingly
Changed version for all column
Upvotes: 0
Views: 347
Reputation: 1419
it will be good creating separate column family preserving unnecessary version for other 28 column will adversely affect the performance since the size of Hstore file is increased Increased in the size of Hbase data will increase the number of regions that will increase the Number of mappers per region server
so by creating the two column family store file size will not be storing the unnecessary data, help in less split during compaction. IO performance will be improved
if there are two column family A and B and cardinality of A is 1million and B is 1Billion, Data of A is spread across many regions and regions server.This makes mass scans for ColumnFamilyA less efficient.
regions are distributed as per the rowkey, so even if A has 1 million rows and has a good distribution across rowkeys. then yes you may need to scan all those regions. I don't think that will impact much but this can only be avoided by using different table for these two high versioned columns.
Upvotes: 0