Reputation: 43
I need to store the messages pushed to Kafka in a deep storage. We are using Azure cloud services so I suppose Azure Blob storage could be a better option. I want to use Kafka Connect's sink connector API to push data to Azure Blob. Kafka documentation mostly suggests HDFS to export data however, in that case I need a Linux VM running Hadoop that will be costly I guess. My question is Azure Blob storage an appropriate choice to store JSON objects and building a custom sink connector is a reasonable solution for this case?
Upvotes: 4
Views: 9445
Reputation: 41
If someone is looking for opensource alternative of Kafka sink connector for Azure Blob Storage. I have developed it here
It has all the features that are in enterprise version.
Upvotes: 1
Reputation: 12088
If anyone bumps into this question now, you should know that there is now a kafka connect sink for azure blob storage
Upvotes: 3
Reputation: 1431
A custom sink connector definitely works. Kafka Connect was absolutely designed so you could plugin connectors. In fact, connector development is entirely federated. Confluent's JDBC and HDFS connectors were implemented first simply due to the popularity of those two use cases, but there are many more (we keep a list of connectors we're aware of here.
In terms of whether Azure blob storage is appropriate, you mention JSON objects. I think the only thing you'll want to consider is the size of the objects and whether Azure storage will handle the size & number of objects well. I am not sure about Azure storage's characteristics, but in many other object storage systems you might need to aggregate many objects into a single blob to get good performance for a large number of objects (i.e. you might need a file format that supports many JSON objects).
Upvotes: 4