Reputation: 2153
According to this, hbase only remove duplicate or deleted key-value during major compaction.
In a major compaction, deleted key/values are removed, this new file doesn’t contain the tombstone markers and all the duplicate key/values (replace value operations) are removed.
Upvotes: 0
Views: 420
Reputation: 4529
A major compaction is a substantially more expensive and time-consuming operation, think of it as a very granular defragmentation procedure. It has to review each keyvalue, its type, maximum number of versions in the column, time-to-live. It also performs region splits based on configuration rules and environment parameters. Major compaction is often disabled by default and is triggered externally. A minor compaction is narrow in scope, it selects fewer (smaller) files and has much lower impact on latency.
Upvotes: 1
Reputation: 7138
To understand this, we first have to get an idea about when are the delete markers(tombstone) added, when they are deleted and how their updates affects the data in Hbase. There is a specific algorithm regarding how Hbase decides when to perform minor and major compaction. Please check this which clearly explains the entire process with metrics. Happy learning and coding :)
Upvotes: 1