Reputation: 2549
Am new to Hadoop and HBase. Let me explain my question with an example. The data is made small for brevity.
Lets assume we have a file named item.log and it contains following information.
ITEM-1,PRODUCT-1 ITEM-2,PRODUCT-1 ITEM-3,PRODUCT-2 ITEM-4,PRODUCT-2 ITEM-5,PRODUCT-3 ITEM-6,PRODUCT-1 ITEM-7,PRODUCT-1 ITEM-8,PRODUCT-2 ITEM-9,PRODUCT-1
I have a map reduce code as below,
package org.sanjus.hadoop;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
public class ProductMapReduce {
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, LongWritable> {
public void map(LongWritable key, Text value, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException {
String[] columns = value.toString().split(",");
if (columns.length != 2) {
System.out.println("Bad line/value " + value);
return;
}
Text word = new Text(columns[1]);
LongWritable counter = new LongWritable(1L);
output.collect(word, counter);
}
}
public static class Reduce extends MapReduceBase implements Reducer<Text, LongWritable, Text, LongWritable> {
public void reduce(Text key, Iterator<LongWritable> iterator, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException {
long sum = 0L;
while (iterator.hasNext()) {
sum += iterator.next().get();
}
output.collect(key, new LongWritable(sum));
}
}
public static void main(String[] args) throws IOException {
JobConf conf = new JobConf(ProductMapReduce.class);
conf.setJobName("Product Analyzer");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(LongWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
LABEL 1: Output after map reduce is below:
PRODUCT-1 5 PRODUCT-2 3 PRODUCT-3 1
Here is a question:
I have a table in HBase, which has the following information stated below.
PRODUCT-1 10$ PRODUCT-2 20$ PRODUCT-3 30$
Question/Requirement: I want the output of the reduce phase as a consolidation of the reduce output in the "LABEL 1: " and the HBase table stated above
PRODUCT-1 10$ * 5 = 50$ PRODUCT-2 20$ * 3 = 60$ PRODUCT-3 30$ * 1 = 30$
Basically, Key is PRODUCT-1, Value in HBase Table for this key is 10$ and the value of the same key from reducer is 5 and both values are multiplied. ($ symbol is for understanding)
Note: Examples I found in are based on the input or output to HBase. My scenario is, input and output will be a file in HDFS, while I need to process the reducer outputs with information in HBase Table.
Upvotes: 0
Views: 855
Reputation: 2549
This is what I did,
inside my reducer class, I added the overloaded method 'setup'
private HTable htable;
private Configuration config;
protected void setup(Context context) throws IOException, InterruptedException {
Configuration config = HBaseConfiguration.create();
config.addResource(new Path("/etc/hbase/conf.hbase1/hbase-site.xml"));
try {
htable = new HTable(config, "MY_TABLE");
}
catch (IOException e) {
System.out.println("Error getting table from HBase", e);
}
}
Using the HTable.get api, I got the Result object.
Upvotes: 0
Reputation: 1810
Since HBase supports high read throughput and you want to just read data in the reducer (a controlled number of them will be used): You can use HBase API to read the data from the table based on key of the reducer. Since reads in Hbase are fast (~10ms depending on size of data fetched) i do not think your performance will be impacted. Just make sure you initialize the Configuration & HTable in the configure() method of reducer.
Upvotes: 1