Reputation: 31
How to ingest data from kafka to hdfs using spring-xd batch job? I would like to have a batch job which is scheduled to run once in a day. How can I track offsets in kafka?
Upvotes: 0
Views: 423
Reputation: 4179
I assume the stream setup kafka | hdfs
doesn't help you as you want to run this as a batch job so that you can orchestrate as a batch job.
In this case, the out of the box XD batch job module that can run kafka -> hdfs isn't available yet. You can implement a custom batch job module.
In order to read the kafka messges, you would need a ItemReader
implementation that reads Kafka messages from Kafka Broker. See similar approach in AMQPItemReader:
Looking at spring-integration-kafka would help here for Kafka specific implementation: https://github.com/spring-projects/spring-integration-kafka
To write the data into HDFS, XD already has org.springframework.xd.batch.item.hadoop.HdfsTextItemWriter
.
Any of the existing XD batch job modules that write to HDFS would help you how to implement this. Feel free to open JIRA and your contributions are welcome.
Upvotes: 1