Matthew Moisen
Matthew Moisen

Reputation: 18279

How to Store into HBase using Pig and HBaseStorage

In the HBase shell, I created my table via:

create 'pig_table','cf'

In Pig, here are the results of the alias I wish to store into pig_table:

DUMP B;

Produces tuples with 6 fields:

(D1|30|2014-01-01 13:00,D1,30,7.0,2014-01-01 13:00,DEF)
(D1|30|2014-01-01 22:00,D1,30,1.0,2014-01-01 22:00,JKL)
(D10|20|2014-01-01 11:00,D10,20,4.0,2014-01-01 11:00,PQR)
...

The first field is a concatenation of the 2nd, third, and 5th fields, and will be used as the HBase rowkey.

But

STORE B INTO 'hbase://pig_table' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ( 'cf:device_id,cf:cost,cf:hours,cf:start_time,cf:code')

results in:

`Failed to produce result in "hbase:pig_table"

The logs are giving me:

Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.DataByteArray
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.objToBytes(HBaseStorage.java:924)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:875)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:551)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:468)
... 11 more

What is wrong with my syntax?

Upvotes: 1

Views: 8911

Answers (1)

Matthew Moisen
Matthew Moisen

Reputation: 18279

It appears that HBaseStorage does not automatically convert the data fields of the tuples into chararray, and which is necessary before it can be stored in HBase. I simply casted them as such:

C = FOREACH B {
    GENERATE
    (chararray)$0
    ,(chararray)$1
    ,(chararray)$2
    ,(chararray)$3
    ,(chararray)$4
    ,(chararray)$5
    ,(chararray)$6
    ;
}

STORE B INTO 'hbase://pig_table' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ( 'cf:device_id,cf:cost,cf:hours,cf:start_time,cf:code')

Upvotes: 2

Related Questions