user2299491
user2299491

Reputation: 55

HBase to Hive example with Scalding

I'm trying to read data from HBase, process it and then write to Hive. I'm new to both Scalding and Scala.

I have looked in to SpyGlass for reading from HBase. It works well and I can read the data and then write the it a file.

val data = new HBaseSource(
tableName,
hbaseHost,
SCHEMA.head,
SCHEMA.tail.map((x: Symbol) => "data"),
SCHEMA.tail.map((x: Symbol) => new Fields(x.name)),
sourceMode = SourceMode.SCAN_ALL)
.read
.fromBytesWritable(SCHEMA)
.debug
.write(Tsv(output.format("get_list")))

So the question is now how I can write it to Hive. If someone has managed to do this, I would be grateful for a simple example or some help to accomplish this.

Upvotes: 1

Views: 561

Answers (1)

Ben Watson
Ben Watson

Reputation: 5541

You don't actually need to do anything special to write to Hive - your current code is absolutely fine. Hive simply applies metadata on top of data stored within the HDFS. All you need to do is create a Hive table on top of the data you're writing. You have two main options. If you want to move your data to the Hive warehouse, you'll need to load it in with a command like:

load data inpath '/your/file/or/folder/on/the/hdfs' into table your_table;

If you don't want to move the data, you can create an external Hive table which doesn't move the data. The advantages of an external table are that

  • you don't have to load data into it,
  • dropping the table doesn't delete the data.

Upvotes: 1

Related Questions