Reputation: 250
what are best practices for "importing" streamed data from Kafka into HBase?
The usecase is as follows: Vehicle sensor data are streamed to Kafka. Afterwards, these sensordata must be transformed (i.e., deserialized from protobuf in humanreadable data) and stored within HBase.
1) Which toolset do you recommend (e.g., Kafka --> Flume --> HBase, Kafka --> Storm --> HBase, Kafka --> Spark Streaming --> HBase, Kafka --> HBase)
2) What is the best place for doing the protobuf deseralization (e.g., within Flume using interceptors)?
Thank you for your support.
Best, Thomas
Upvotes: 6
Views: 5242
Reputation: 129
1) I recommend to use 'Kafka Connect' Connector to stream your data from Kafka to HBase. There are a couple sink connectors from the Kafka community: http://docs.datamountaineer.com/en/latest/hbase.html https://github.com/mravi/kafka-connect-hbase 2) As for transforming your data, you can use Kafka Streams which is a lightweight Java library included in Kafka since Kafka 0.10 release in May 2016: http://kafka.apache.org/documentation/streams
Upvotes: 0
Reputation: 1126
I think you just need to do Kafka -> Storm -> HBase.
Storm: Storm spout will subscribe to Kafka topic.
Then Storm bolts can transform the data and write it into HBase.
You can use HBase client api in java to write data to HBase from Storm.
I suggested Storm because it actually processes one tuple at a time. In Spark streaming, a micro-batch is processed
. However, if you would like to use common infrastructure for Batch and Stream processing then Spark might be a good choice.
If you end up using Spark then also your flow will be Kafka -> Spark -> HBase.
Upvotes: 4