Reputation: 136
I'm incredibly new to Apache storm and the expansive options available with message queues. The current system reads in files from a data store (text, binary, anything) and passes them into Apache solr for indexing. However, additional processing needs to be done with these files, which is where storm comes in. During the UpdateRequestProcessorChain in storm, it appears that I can write the file being processed to a message broker, which i can then pull with storm to do some parallel real-time processing.
I am expecting an average of 10,000 requests per second at 4KB/message. However, there is a possibility (albeit very rare) of a 100GB+ file being passed in over several seconds. Is there a message queue that will still work well with those requirements?
I already looked into Kafka, which seems to be optimized for 1KB messages. RabbitMQ does not seem to like large files. ActiveMQ does seems to have blob messages for sending large files. Does anyone have experience with any of the above or others?
Upvotes: 0
Views: 518
Reputation: 1407
I don't think putting 100GB+ file in any message queue is a good idea. You can preprocess the file and break it into manageable chunks before putting it into message queue. You can add some kind of id to each chunk, so that you can relate different chunks of the file in Storm while processing. Also, it is also not a good idea to store a very large file as one document in Solr.
Upvotes: 1