Reputation: 31

How can I send data from kafka to hive

I want to send data from Kafka (doing some MapReduce job) to hive.
Is this suitable to use spark streaming?
OR some better ways?

Upvotes: 3

Answers (3)

DennisLi

Reputation: 4154

There's already one Hive-Kafka ETL practice in Hive document.

The users are able to create an external table that is a view over one Kafka topic

For more info: https://github.com/apache/hive/tree/master/kafka-handler

Upvotes: 1

OneCricketeer

Reputation: 191983

From a streaming perspective, Hive tables built ahead of time, dumped into using Spark Streaming or Flink will work fine, for the most part, but what if the schema of the Hive output in the Spark job changes? That's where you might want something like Streamsets, Kafka Connect HDFS Connector, or Apache Gobblin

Also, keep in mind, HDFS doesn't like dealing with tiny files, so setting up a large batch size ahead of HDFS would be beneficial for later Hive consumption

Upvotes: 1

Robin Moffatt

Reputation: 32130

You can use Kafka Connect and the HDFS connector to do this. This streams data from Kafka to HDFS, and defines the Hive table on top automatically. It's available standalone or as part of Confluent Platform.

Disclaimer: I work for Confluent.

Upvotes: 3

How can I send data from kafka to hive

Answers (3)

Related Questions