Avijit
Avijit

Reputation: 1810

Load data in HBase from HDFS without using Pig Script

I have .csv files in HDFS. I want to load these in HBASE tables without using Pig script.

Is there any other way available?

Upvotes: 2

Views: 330

Answers (1)

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29237

There might be several ways. But some of the options are like below.

Option 1: Simple way is ImportTsv

ImportTsv is a utility that will load data in TSV format into HBase. It has two distinct usages: loading data from TSV format in HDFS into HBase via Puts, and preparing StoreFiles to be loaded via the completebulkload.

To load data via Puts (i.e., non-bulk loading):

$ bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=a,b,c <tablename> <hdfs-inputdir>

To generate StoreFiles for bulk-loading:

$ bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=a,b,c -Dimporttsv.bulk.output=hdfs://storefile-outputdir <tablename> <hdfs-data-inputdir>

These generated StoreFiles can be loaded into HBase via Section 14.1.10, “CompleteBulkLoad”.

Example hbase> hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=, -Dimporttsv.columns="c1,c2,c3...." hdfs://servername:/tmp/yourcsv.csv

Option 2 : Custom map-reduce way

Write a mapreduce program and csv parser in case you need to parse the csv which is complex

see example here

Upvotes: 2

Related Questions