Jaime Caffarel
Jaime Caffarel

Reputation: 2469

Is the poll() method of kafka SourceTask class thread safe?

I'm using Kafka Connect to send data from a Web Service to Kafka (version 0.10.1) in distributed mode. The poll() method documentation states that:

Poll this SourceTask for new records. This method should block if no data is currently available.

However, I'm not sure if the poll() method can be called by more than one thread by Kafka. The requests to the Web Service are very time-consuming, they are done in that method and I would want to avoid sending them multiple times.

There is a question that states that prior to version 0.10.2.1-cp2, poll() method could be called by different threads. However, from the release notes I can't confirm if this situation doesn't happen on newer versions.

Upvotes: 0

Views: 676

Answers (1)

Randall Hauch
Randall Hauch

Reputation: 7197

Kafka Connect tasks do not need to be threadsafe from the perspective of the framework, and tasks should not assume they can communicate with other tasks and connectors via static mechanisms. Note that of course there may be multiple tasks running in separate threads.

At least with Kafka 0.11.0.0, the methods on each SourceTask and SinkTask instance are called from a thread dedicated to that task. The same thread is used for the lifetime of the task instance until it is stopped. Even when the task is paused, the thread simply blocks.

My understanding is that this was the behavior ever since at least 0.10.1.0, but you'd have to check the codebase to be sure.

Upvotes: 1

Related Questions