Adarsh
Adarsh

Reputation: 227

How to use spark streaming to get data from HBASE table using scala

I am trying to identify a solution to read data from HBASE table using spark streaming and write the data to another HBASE table.

I found numerous samples in internet which asks to create a DSTREAM to get the data from HDFS files and all.But I was unable to find any examples to get data from HBASE tables

For e.g, if I have a HBASE table 'SAMPLE' with columns as 'name' and 'activeStatus'. How can I retrieve the data from the table SAMPLE based on activeStatus column using spark streaming (New data?

Any examples to retrieve the data from HBASE table using spark streaming is welcome.

Regards, Adarsh K S

Upvotes: 0

Views: 1277

Answers (2)

Hari
Hari

Reputation: 451

You can connect to hbase from spark multiple ways

Hortonworks SHC read hbase directly to dataframe using user defined catalog whereas hbase-rdd read it as rdd and can be converted to DF using toDF method. hbase-rdd has bulk write option (direct write HFiles) preferred for massive data write.

Upvotes: 2

b2Wc0EKKOvLPn
b2Wc0EKKOvLPn

Reputation: 2074

What you need is a library that enables spark to interact with hbase. Horton Works' shc is such an extension:

https://github.com/hortonworks-spark/shc

Upvotes: 1

Related Questions