Reputation: 187
I am using CDH4.4. I have an app currently running which serializes records into a single column in hbase via avro. I am in the process of moving my current solr index of this table into solrcloud, so I'm testing the MapReduceIndexerTool to do bulk indexing of the whole table. I have a very simple morphlines file which currently uses "extractHBaseCells" to read records from HBase.
I set this up a tracer proof-of-concept, only indexing the rowkey => id and stuffing the avro blob into another field, just to verify that I could get data from HBase over to my collection in SolrCloud, and that works. But I'd like to parse the avro and stick those values into their own fields on the solrdocuments before submitting them to solrcloud. But it would seem that the nature of "extractHBaseCells" prevents this. If there were an hbase reader command that emitted more general output that could then flow into the avro commands in morphlines, I am confident I could solve my own problem.
Are there any known workarounds for parsing avro that has been stored in HBase or possibly some more morphlines commands that could address this?
Upvotes: 1
Views: 594
Reputation: 187
user1842757's link put me on the right path. My problem was with my solr schema. I did not have an "_attachment_body" field or an "_attachment_mimetype" field defined in my schema. These are required for extractAvroPaths to work, but this is not clearly stated in any of the tutorials, examples, or pdf manuals I found supporting morphlines or the hbase-mr-indexer.
Upvotes: 0
Reputation: 326
Are you able to read just the avro column and extractAvroPaths to parse the avro?
Or worst case, a java action which would cast/transform the hbase avro column to an avro object.
Upvotes: 2