Reputation: 501
I'm trying to run a HDFS Source Connector and a FileStream Source Connector. I was wondering how it would work if we set tasks.max
> 1. Isn't it the connector's job to make sure that the parallelism is handled correctly?
For example, would it not be a problem for FileStream Source Connector if there are more than 1 tasks accessing the file? How will the connector know which line is being read by which task and how to make sure that there is no clash among tasks?
OR
Is it that the setting should be tasks.max=1
for such connectors where such a problem can occur?
Upvotes: 4
Views: 2352
Reputation: 4375
There is no such problem, since according the docs:
tasks.max
- The maximum number of tasks that should be created for this connector. The connector may create fewer tasks if it cannot achieve this level of parallelism.
For example, for File Stream Source Connector
max.tasks
is simple ignored, while for JDBC Source Connector
the real number of tasks is defined as minimum of tasks.max
and tables count.
Upvotes: 5