Reputation: 63
I'm moving the streaming app from flume to kafka. So needed help since I'm new to kafka.
I've a windows machine on which CSV Files are continously being generated by IOT sensors at a particular location say D:/Folder. I want to transfer it to a hadoop cluster.
There are millions of small files being generated daily in the folder. And i want to spool the folder with kafka for any new files. Which Kafka connect should I use to spool the directory for new files. I read about kafka connect fileStream but I think it only works with 1 file.
Upvotes: 1
Views: 1623
Reputation: 1091
Use Kafka-connect-spooldir. It supports reading all csv files within a folder
https://www.confluent.io/hub/jcustenborder/kafka-connect-spooldir https://docs.confluent.io/current/connect/kafka-connect-spooldir/
Upvotes: 1