Guo
Guo

Reputation: 1813

How does HBase internally analysis "hbase shell command"?

Suppose, I run get 't1','r1' command in hbase shell, How does HBase internally analysis and execute this command?

Upvotes: 1

Views: 712

Answers (1)

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29237

This is a jruby script. which was defined under set of shell commands.

I am quoting here java HashMap as an example for better understanding..

  • while inserting , Your rowkey is just like key in java HashMap which will be stored in one of the region server(in hash map case these are buckets which are uniformly distributed..)
  • While getting back the row, it uses rowkey and it will locate particular region server and brings the value for that, from the table you mentioned.example of hashmap like...

That's the reason while dealing with hbase rowkey design should be perfect (with salting technique , using hashing algorithm for ex: mumur hash) and it should be uniformly distributed across region servers to prevent hot spotting... enter image description here

For more details, have a look at get.rb

module Shell
  module Commands
    class Get < Command
      def help
        return <<-EOF
Get row or cell contents; pass table name, row, and optionally
a dictionary of column(s), timestamp, timerange and versions. Examples:
  hbase> get 'ns1:t1', 'r1'
  hbase> get 't1', 'r1'
  hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]}
  hbase> get 't1', 'r1', {COLUMN => 'c1'}
  hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
  hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
  hbase> get 't1', 'r1', 'c1'
  hbase> get 't1', 'r1', 'c1', 'c2'
  hbase> get 't1', 'r1', ['c1', 'c2']
  hbase> get 't1', 'r1', {COLUMN => 'c1', ATTRIBUTES => {'mykey'=>'myvalue'}}
  hbase> get 't1', 'r1', {COLUMN => 'c1', AUTHORIZATIONS => ['PRIVATE','SECRET']}
  hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE'}
  hbase> get 't1', 'r1', {CONSISTENCY => 'TIMELINE', REGION_REPLICA_ID => 1}
Besides the default 'toStringBinary' format, 'get' also supports custom formatting by
column.  A user can define a FORMATTER by adding it to the column name in the get
specification.  The FORMATTER can be stipulated: 
 1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString)
 2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'.
Example formatting cf:qualifier1 and cf:qualifier2 both as Integers: 
  hbase> get 't1', 'r1' {COLUMN => ['cf:qualifier1:toInt',
    'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] } 
Note that you can specify a FORMATTER by column only (cf:qualifier).  You cannot specify
a FORMATTER for all columns of a column family.

The same commands also can be run on a reference to a table (obtained via get_table or
create_table). Suppose you had a reference t to table 't1', the corresponding commands
would be:
  hbase> t.get 'r1'
  hbase> t.get 'r1', {TIMERANGE => [ts1, ts2]}
  hbase> t.get 'r1', {COLUMN => 'c1'}
  hbase> t.get 'r1', {COLUMN => ['c1', 'c2', 'c3']}
  hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
  hbase> t.get 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
  hbase> t.get 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
  hbase> t.get 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
  hbase> t.get 'r1', 'c1'
  hbase> t.get 'r1', 'c1', 'c2'
  hbase> t.get 'r1', ['c1', 'c2']
  hbase> t.get 'r1', {CONSISTENCY => 'TIMELINE'}
  hbase> t.get 'r1', {CONSISTENCY => 'TIMELINE', REGION_REPLICA_ID => 1}
EOF
      end

      def command(table, row, *args)
        get(table(table), row, *args)
      end

      def get(table, row, *args)
        @start_time = Time.now
        formatter.header(["COLUMN", "CELL"])

        count, is_stale = table._get_internal(row, *args) do |column, value|
          formatter.row([ column, value ])
        end

        formatter.footer(count, is_stale)
      end
    end
  end
end

#add get command to table
::Hbase::Table.add_shell_command('get')

if you want to get one record similarly like hbase shell command, you can follow below snippet.

Update based on your comment : if you want to have same functionality in java

 /**
     * Get a row
     */
    @Override
    public void getOneRecord(final String tableName, final String rowKey) throws IOException {
        final HTable table = new HTable(HBaseConn.getHBaseConfig(), getTable(tableName));
        final Get get = new Get(rowKey.getBytes());
        final Result rs = table.get(get);
        for (final KeyValue kv : rs.raw()) {
            LOG.info(kv.getRow() + " " + kv.getFamily() + ":" + kv.getQualifier() + " " + +kv.getTimestamp());
            LOG.info(new String(kv.getValue()));
        }
    }

Note : There java approach and shell approach are 2 different things. pls. don't mix both, as I have seen your other questions as well, I think you are bit confused about them. If you want to write jruby just like I explained you can also do as well. but that was not common approach.

Hope that helps.

Upvotes: 2

Related Questions