moe
moe

Reputation: 1

Kafka HDFS Connector - Without Full Confluent

I have a running instance of Kafka 0.10 and I'm currently using Gobblin to store data into HDFS. I want to switch to Kafka Connect, and as I researched I found that Confluent provide a connector.

However, is there a way to use this connector without using the entire Confluent platform? Meaning can I for example copy the relevant scripts from Confluent source and somehow make my Kafka instance use it? I'm basically still learning my way through this stuff so I'm not yet very well versed in this space.

Thanks.

Upvotes: 0

Views: 755

Answers (1)

Yuriy Tseretyan
Yuriy Tseretyan

Reputation: 1716

Yes it is possible. I've done that. I use slightly modified Confluent HDFS standalone connector that runs in Docker container. However, you will have to use SchemaRegistry too. Because connectors are tightly coupled to SchemaRegistry. Also, you will have to send messages with special format. To support automatic schema recognition Confluent Kafka consumers introduce an internal format of messages. Therefore, to be compatible with confluent consumers, your producers must compose messages according to the following format.

  • Header (5 bytes)
    • The first byte of the message "Magic byte" should be always 0
    • The next 4 bytes should be Id of schema in schema registry encoded in Big Endian format.
  • Payload (Avro\Parquet object, binary encoded).

PS Be very careful with sending messages to topic becuase if message does not match schema, or a schema with Id does not exist in registry, consumer silently fails: worker thread stops but applications still hangs in memory and does not exit.

Upvotes: 1

Related Questions