How to import a CSV into HBASE table using MapReduce

Question

Hi I am quite new to hadoop and I am trying to import a csv table to Hbase using MapReduce.

I am using hadoop 1.2.1 and hbase 1.1.1

i have data in following format:

Wban Number, YearMonthDay, Time, Hourly Precip

03011,20060301,0050,0

03011,20060301,0150,0

I have written the following code for bulk load

public class BulkLoadDriver extends Configured implements Tool{

public static void main(String [] args) throws Exception{


    int result= ToolRunner.run(HBaseConfiguration.create(), new BulkLoadDriver(), args);
}

public static enum COUNTER_TEST{FILE_FOUND, FILE_NOT_FOUND};
public String tableName="hpd_table";// name of the table to be inserted in hbase

@Override
public int run(String[] args) throws Exception {

    //Configuration conf= this.getConf();

    Configuration conf = HBaseConfiguration.create();
    Job job= new Job(conf,"BulkLoad"); 
    job.setJarByClass(getClass());

    job.setMapperClass(bulkMapper.class);

    FileInputFormat.setInputPaths(job, new Path(args[0]));
    job.setInputFormatClass(TextInputFormat.class);


    TableMapReduceUtil.initTableReducerJob(tableName, null, job);   //for HBase table
    job.setNumReduceTasks(0);
    return (job.waitForCompletion(true)?0:1);


}
private static class bulkMapper extends Mapper{
    //static class bulkMapper extends TableMapper {

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
    {
        String [] val= value.toString().split(",");


        // store the split values in the bytes format so that they can be added to the PUT object
        byte[] wban=Bytes.toBytes(val[0]);
        byte[] ymd= Bytes.toBytes(val[1]);
        byte[] tym=Bytes.toBytes(val[2]);
        byte[] hPrec=Bytes.toBytes(val[3]);

        Put put=new Put(wban);
        put.add(ymd, tym, hPrec);

        System.out.println(wban);
        context.write(new ImmutableBytesWritable(wban), put);

        context.getCounter(COUNTER_TEST.FILE_FOUND).increment(1);

    }

}

}

I have created a jar for this and ran following in the terminal:

hadoop jar ~/hadoop-1.2.1/MRData/bulkLoad.jar bulkLoad.BulkLoadDriver /MR/input/200603hpd.txt hpd_table

But the output that I get is hundreds of following type of lines: attempt_201509012322_0001_m_000000_0: [B@2d22bfc8 attempt_201509012322_0001_m_000000_0: [B@445cfa9e

I am not sure what do they mean and how to perform this bulk upload. please help.

Thanks in advance.

How to import a CSV into HBASE table using MapReduce

Answers (1)

Related Questions