Reputation: 37
When I restart my HDFS Sink Connector, will I receive messages from the last offset that I read? How does a connector know from where it has to restart?
Upvotes: 0
Views: 209
Reputation: 18475
Internally, Kafka Connect is just a Kafka client using the standard Producer and Consumer APIs. In your case of a sink connector, the Consumer API is under the covers.
Therefore, you have the same behaviour in your sink connector as you would have with a standard consumer. A consumer makes use of the Consumer Groups concept to manage offsets, i.e. committing (processed) offsets back to the broker. If you now restart your sink connector it knows exactly where to continue to read from the source topic.
The benefit of having the standard consumer under the covers is that you could apply the typical consumer configurations. Just make sure to take the following note into account:
"For configuration of the producers used by Kafka source tasks and the consumers used by Kafka sink tasks, the same parameters can be used but need to be prefixed with producer. and consumer. respectively."
The Consumer Group (group.id
) gets automatically created by kafka-connect based on the connector name you are using.
You can specify the Consumer Group (group.id
) in your worker properties. In distributed mode remember that
"all workers with the same
group.id
will be in the same connect cluster. For example, if worker-a has group.id=connect-cluster-a and worker-b has the same group.id, worker-a and worker-b will form a cluster called connect-cluster-a"
Upvotes: 1