mbgsuirp
mbgsuirp

Reputation: 628

Writing data from HDFS to Kafka

Kafka is commonly used in ingestion pipeline when data is finally written to HDFS. Are there any designs where Kafka is used to transfer data from HDFS to external systems? I understand that Kafka is more suitable as a messaging system, but can we use the publish-subscribe of Kafka for transferring data? In this use case, producers will write data from HDFS (1 row at a time) to the topics and consumers will read asynchronously.

There might be challenges in implementing this like size of data, security, etc.

I am aware of the other ways, such as sqoop, distcp, etc.

Upvotes: 2

Views: 2716

Answers (1)

Lundahl
Lundahl

Reputation: 6572

You should be able to implement this using Mapreduce or whatever framework you choose. I'd guess that something like Apache Nifi could do it out of the box but haven't tried that direction.

Upvotes: 1

Related Questions