Reputation: 71
I have a hbase table with 4 columns. I want to search for a string in column1 and get the value of column2 from every row, where I get a match. I works with these two codelines
scan 'table', { COLUMNS => 'column1', FILTER => "ValueFilter(=, 'substring:value')"}
Then foreach row: get 'table', $row, {COLUMNS => 'column2'}
How can I get the result (e.g. 'value1, value2, value3') by executing only one command?
best regards n3
Upvotes: 0
Views: 1569
Reputation: 136
You can pipe commands to HBase shell from BASH (or any other unix shell). From there you can create a single line command or better yet a script that will perform the task(s) you require.
For example, you can get a list of all the rows that match a value using:
echo "scan 'table', { COLUMNS => 'column1', FILTER => \"ValueFilter(=, 'substring:value')\"}" | hbase shell 2>/dev/null | awk '{print $1}'
NOTE: Don't forget the escape char \
for the double quotes around ValueFilter
EDIT: Here is a script that will find all the rows that contain a particular string value within column1, then obtain values within column2 for those rows:
#!/usr/bin/env bash
# Set variables according to your environment
TABLE="table"
COLUMN1="column1"
COLUMN2="column2"
TEMP="/tmp/temp"
OUTPUT="/tmp/output.txt"
LIMIT=100000000 # Set limit for table scan
VALUE=$1 # The string value to search
if [ -z $1 ]; then
echo -e "MISSING PARAMENTER!\nUsage: $0 search_string"
exit 1
fi
# Get all the row names that match $VALUE in $COLUMN1 of $TABLE and store in $TEMP file
echo "scan '$TABLE', { COLUMNS => '$COLUMN1', LIMIT => $LIMIT, FILTER => \"ValueFilter(=, 'substring:$VALUE')\"}" | hbase shell 2>/dev/null | grep -v "^$" > $TEMP
NUM_OF_ROWS=$(cat $TEMP | grep "row(s)" | awk '{print $1}')
LAST_ROW=$(($(cat $TEMP | grep -n "row(s)" | awk -F ":" '{print $1}')-1))
FIRST_ROW=$(($LAST_ROW-$NUM_OF_ROWS+1))
if [ -z $FIRST_ROW ]; then
echo "SOMETHING WENT WRONG, EXITING"
exit 1
fi
# Clear $OUTPUT file
echo "SEARCH RESULTS" > $OUTPUT
for ROW in $(cat $TEMP | awk '{print $1}' | sed -n ${FIRST_ROW},${LAST_ROW}p)
do
echo "get '$TABLE','$ROW',{ COLUMNS => '$COLUMN2'}" | hbase shell 2>/dev/null | grep "value" >> $OUTPUT
done
# Optional cleanup
# rm -f $TEMP
echo "SEARCH COMPLETE, RESULTS STORED IN $OUTPUT"
exit 0
To use the script, simply execute it with one parameter indicating the string value to search.
It's not particularly fast but it gets the job done.
Upvotes: 1
Reputation: 1762
I think you can use SingleColumnValueFilter from inside the hbase shell.
scan 'table', {COLUMNS => ['cf:column1', 'cf:column2'], FILTER => "SingleColumnValueFilter('cf', 'column1', =, 'substring:value', true, true)"}
First true
in the SingleColumnValueFilter
represents the filterIfColumnMissing
and second true
represents the setLatestVersionOnly
Upvotes: 1