Reputation: 997
I got a requirement to subscribe to real time message files that might be published from an already existing MQ setup at the customer side on pub-sub model. When I subscribe to the MQ's topic to receive message files, shall I use Kafka to ONLY get the files, then process them and store in my file system using any preferable API like Python? I am expecting a setup like below:-
Customer's MQ ----> Kafka Setup <---- API to receive & processing(Python) ---> File system
When the files are published to Kafka, shall I use Python to talk to Kafka broker to receive the files for further processing?
Note: I don't want the message file contents to be broken in different partitions. Instead, I want the full file to be published and consumed.
Upvotes: 1
Views: 2177
Reputation: 191864
I want the full file to be published and consumed.
Kafka is not meant to be used for file-delivery... It has a default max message size of only 1MB (and setting this much higher than maybe 5MB, you'll just overload the brokers connections and storage).
Instead, you should setup a shared file-system (such as FTP, NAS, HDFS, S3, etc), then only send the URI of the file via Kafka, then connect to this filesystem after you read a message in the consumer. Then you have small messages and don't need to deal with partitions and ordering as you just have references to external systems where the whole files are stored.
shall I use Python to talk to Kafka broker to receive the files for further processing?
Any language would work.
Upvotes: 2
Reputation: 919
Kafka stores messages in (K,V) format. All the messages with similar key would be pushed in a same partition. Also a partition can have messages with different keys as well. But as long as, your producer has pushed data of a file using consistent Key i.e. filename it would be stored in a single partition.
Now, you can use any programming language to push message to Kafka. However, I would recommend you to use Java. This is because, all of the latest features of Kafka would be available to Java client right away. As per my understanding, internally kafka-python depends on librdkafka library which is going to release new version with the latest features soon.
Upvotes: 2