dijin
dijin

Reputation: 61

Retrieving Hbase versioned data

I am trying to retrieve different version of Hbase data.

Step 1 - Table abc has 4 column all with version 1 and in single column family.

a b c d

1 1 1 1

Step 2 - Column b and c values get changed and we load updated value of column b and c as version 2.(Column b and c has version 1 and version 2 data)

a b c d

1 1/2 1/2 1

I want to retrieve the below set of versions from Hbase.

a b c d

1 1 2 1

Is there any way to achieve this ??

Thanks in advance.

Upvotes: 2

Views: 206

Answers (1)

Martin Serrano
Martin Serrano

Reputation: 3795

HBase has decent documentation on this concept:

The maximum number of versions to store for a given column is part of the column schema and is specified at table creation, or via an alter command, via HColumnDescriptor.DEFAULT_VERSIONS. Prior to HBase 0.96, the default number of versions kept was 3, but in 0.96 and newer has been changed to 1.

So if you are designing a schema now, you can set things up to have a particular number of prior versions stored. If the HBase table already exists, you can alter it but won't be able to get prior versions for data that has already been stored.

Here is an example for getting prior versions for a column (it comes from that documentation):

public static final byte[] CF = "cf".getBytes();
public static final byte[] ATTR = "attr".getBytes();
...
Get get = new Get(Bytes.toBytes("row1"));
get.setMaxVersions(3);  // will return last 3 versions of row
Result r = table.get(get);
byte[] b = r.getValue(CF, ATTR);  // returns current version of value
List<KeyValue> kv = r.getColumn(CF, ATTR);  // returns all versions of this column

It is very important to keep in mind that versions from an HBase point of view are directly associated with the timestamp used when writing. A default put command will use the time of its execution as its timestamp. So normally this provides versioning in sequence with our changes. But, given two put operations with timestamps T1 and T2 where T1 is less than T2, if T1 happens to actually be written after T2, it will still appear as the earlier version. It is the timestamp HBase cares about, not when in absolute time it was actually written. This makes it possible for instance to overwrite earlier versions by setting the same timestamp.

Upvotes: 1

Related Questions