kalyan chakri
kalyan chakri

Reputation: 31

Spring-xd batch job to ingest data from kafka to hdfs

How to ingest data from kafka to hdfs using spring-xd batch job? I would like to have a batch job which is scheduled to run once in a day. How can I track offsets in kafka?

Upvotes: 0

Views: 423

Answers (1)

Ilayaperumal Gopinathan
Ilayaperumal Gopinathan

Reputation: 4179

I assume the stream setup kafka | hdfs doesn't help you as you want to run this as a batch job so that you can orchestrate as a batch job.

In this case, the out of the box XD batch job module that can run kafka -> hdfs isn't available yet. You can implement a custom batch job module.

In order to read the kafka messges, you would need a ItemReader implementation that reads Kafka messages from Kafka Broker. See similar approach in AMQPItemReader:

https://github.com/spring-projects/spring-batch/blob/master/spring-batch-infrastructure/src/main/java/org/springframework/batch/item/amqp/AmqpItemReader.java

Looking at spring-integration-kafka would help here for Kafka specific implementation: https://github.com/spring-projects/spring-integration-kafka

To write the data into HDFS, XD already has org.springframework.xd.batch.item.hadoop.HdfsTextItemWriter.

Any of the existing XD batch job modules that write to HDFS would help you how to implement this. Feel free to open JIRA and your contributions are welcome.

Upvotes: 1

Related Questions