zjk
zjk

Reputation: 2153

Is it true that deleted key-value are removed in Hbase only during major compaction

According to this, hbase only remove duplicate or deleted key-value during major compaction.

In a major compaction, deleted key/values are removed, this new file doesn’t contain the tombstone markers and all the duplicate key/values (replace value operations) are removed.

  1. Major compaction merges all HFiles into one big HFile while minor compaction select some HFiles to merge. Is this the correct understanding?
  2. If major compaction can remove duplicate keys why not minor compaction? Aren't the procedures basically the same?

Upvotes: 0

Views: 420

Answers (2)

Sergei Rodionov
Sergei Rodionov

Reputation: 4529

A major compaction is a substantially more expensive and time-consuming operation, think of it as a very granular defragmentation procedure. It has to review each keyvalue, its type, maximum number of versions in the column, time-to-live. It also performs region splits based on configuration rules and environment parameters. Major compaction is often disabled by default and is triggered externally. A minor compaction is narrow in scope, it selects fewer (smaller) files and has much lower impact on latency.

Upvotes: 1

Ramzy
Ramzy

Reputation: 7138

To understand this, we first have to get an idea about when are the delete markers(tombstone) added, when they are deleted and how their updates affects the data in Hbase. There is a specific algorithm regarding how Hbase decides when to perform minor and major compaction. Please check this which clearly explains the entire process with metrics. Happy learning and coding :)

Upvotes: 1

Related Questions