t k
t k

Reputation: 65

Getting all Versions of one Record in Apache Pig with HBaseStorage()

im using the org.apache.pig.backend.hadoop.hbase.HBaseStorage() in Pig Latin. Is it possible to get ALL Versions of one Value?

In the HBase Shell the command is:

get 'table', 'cf:column', {COLUMN => 'cf:column', VERSIONS => 5}

Is it possible in Pig too?

Input Sample:

My HBase Table looks like this:

hbase(main):019:0> put 'mytable', 'mykey1', 'cf:onlyoneclumn', 'value 1'
0 row(s) in 0.0070 seconds

hbase(main):020:0> put 'mytable', 'mykey1', 'cf:onlyoneclumn', 'value 2'
0 row(s) in 0.0070 seconds

hbase(main):021:0> put 'mytable', 'mykey1', 'cf:onlyoneclumn', 'value 3'
0 row(s) in 0.0050 seconds

hbase(main):022:0> put 'mytable', 'mykey1', 'cf:onlyoneclumn', 'value 4'
0 row(s) in 0.0050 seconds

hbase(main):023:0> get 'mytable', 'mykey1'
COLUMN                CELL                                                      
 cf:onlyoneclumn      timestamp=1376470137654, value=value 4                    
1 row(s) in 0.0370 seconds

hbase(main):024:0> get 'mytable', 'mykey1',  {COLUMN => 'cf:onlyoneclumn', VERSIONS => 5}
COLUMN                CELL                                                      
 cf:onlyoneclumn      timestamp=1376470137654, value=value 4                    
 cf:onlyoneclumn      timestamp=1376470136632, value=value 3                    
 cf:onlyoneclumn      timestamp=1376470135411, value=value 2                    
3 row(s) in 0.0140 seconds

Upvotes: 0

Views: 975

Answers (1)

Arnon Rotem-Gal-Oz
Arnon Rotem-Gal-Oz

Reputation: 25909

The current(0.11.1) HBaseStorage which is the class pig ships with to read HBase tables does not support it. It only supports the following

optString - Loader options. Known options:
-loadKey=(true|false) Load the row key as the first column
-gt=minKeyVal
-lt=maxKeyVal
-gte=minKeyVal
-lte=maxKeyVal
-limit=numRowsPerRegion max number of rows to retrieve per region
-delim=char delimiter to use when parsing column names (default is space or comma)
-ignoreWhitespace=(true|false) ignore spaces when parsing column names (default true)
-caching=numRows number of rows to cache (faster scans, more memory).
-noWAL=(true|false) Sets the write ahead to false for faster loading.
-minTimestamp= Scan's timestamp for min timeRange
-maxTimestamp= Scan's timestamp for max timeRange
-timestamp= Scan's specified timestamp
-caster=(HBaseBinaryConverter|Utf8StorageConverter) Utf8StorageConverter is the default To be used with extreme caution,

since this could result in data loss (see http://hbase.apache.org/book.html#perf.hbase.client.putwal).

What you can do is get the HBaseStorage code, and build your own Loader that does support it

Upvotes: 1

Related Questions