n-3
n-3

Reputation: 71

Combine few hbase shell commands

I have a hbase table with 4 columns. I want to search for a string in column1 and get the value of column2 from every row, where I get a match. I works with these two codelines

scan 'table', { COLUMNS => 'column1', FILTER => "ValueFilter(=, 'substring:value')"}

Then foreach row: get 'table', $row, {COLUMNS => 'column2'}

How can I get the result (e.g. 'value1, value2, value3') by executing only one command?

best regards n3

Upvotes: 0

Views: 1569

Answers (2)

Martin K
Martin K

Reputation: 136

You can pipe commands to HBase shell from BASH (or any other unix shell). From there you can create a single line command or better yet a script that will perform the task(s) you require.

For example, you can get a list of all the rows that match a value using:

echo "scan 'table', { COLUMNS => 'column1', FILTER => \"ValueFilter(=, 'substring:value')\"}" | hbase shell 2>/dev/null | awk '{print $1}'

NOTE: Don't forget the escape char \ for the double quotes around ValueFilter

EDIT: Here is a script that will find all the rows that contain a particular string value within column1, then obtain values within column2 for those rows:

#!/usr/bin/env bash

# Set variables according to your environment
TABLE="table"
COLUMN1="column1"
COLUMN2="column2"

TEMP="/tmp/temp"
OUTPUT="/tmp/output.txt"
LIMIT=100000000   # Set limit for table scan
VALUE=$1          # The string value to search

if [ -z $1 ]; then
  echo -e "MISSING PARAMENTER!\nUsage: $0 search_string"
  exit 1
fi

# Get all the row names that match $VALUE in $COLUMN1 of $TABLE and store in $TEMP file
echo "scan '$TABLE', { COLUMNS => '$COLUMN1', LIMIT => $LIMIT, FILTER => \"ValueFilter(=, 'substring:$VALUE')\"}" | hbase shell 2>/dev/null | grep -v "^$" > $TEMP

NUM_OF_ROWS=$(cat $TEMP | grep "row(s)" | awk '{print $1}')
LAST_ROW=$(($(cat $TEMP | grep -n "row(s)" | awk -F ":" '{print $1}')-1))
FIRST_ROW=$(($LAST_ROW-$NUM_OF_ROWS+1))

if [ -z $FIRST_ROW ]; then
  echo "SOMETHING WENT WRONG, EXITING"
  exit 1
fi

# Clear $OUTPUT file
echo "SEARCH RESULTS" > $OUTPUT

for ROW in $(cat $TEMP | awk '{print $1}' | sed -n ${FIRST_ROW},${LAST_ROW}p)
do
  echo "get '$TABLE','$ROW',{ COLUMNS => '$COLUMN2'}" | hbase shell 2>/dev/null | grep "value" >> $OUTPUT
done

# Optional cleanup
# rm -f $TEMP

echo "SEARCH COMPLETE, RESULTS STORED IN $OUTPUT"

exit 0

To use the script, simply execute it with one parameter indicating the string value to search.

It's not particularly fast but it gets the job done.

Upvotes: 1

Kadir
Kadir

Reputation: 1762

I think you can use SingleColumnValueFilter from inside the hbase shell.

scan 'table', {COLUMNS => ['cf:column1', 'cf:column2'], FILTER => "SingleColumnValueFilter('cf', 'column1', =, 'substring:value', true, true)"}

First true in the SingleColumnValueFilter represents the filterIfColumnMissing and second true represents the setLatestVersionOnly

Upvotes: 1

Related Questions