Reputation: 123
We are trying to copy data from a database table into Kafka using the Confluent JDBC-Source connector. The problem is that the data in that table gets updated exactly one every night, so we would like to copy the latest data after the table has been updated, for example we would like the connector to run at 7am every day. Is that possible?
The documentation only shows the poll interval property, we could only use that if we set the poll.interval
to 24 hours, but then we would have to start the connector at 7am, which is not really an elegant solution. Is there a better way to do that?
Thank you!
Upvotes: 4
Views: 2522
Reputation: 32100
Kafka, and Kafka Connect, is not really designed for "batch". That is, you can use it in a batch-driven way, but the concept of integrating it into the kind of daisy-chained workflow you describe is not native to it.
The polling interval of the JDBC connector is so that you can periodically check for new data, at a rate at which strikes the balance for you between load on the source system (from polling) vs latency of received data.
Why not set the connector to poll every few minutes (or few times an hour; whatever), and then once the new data is available, it'll pull it in. No new data, no new records.
Alternatively, you can use the Kafka Connect REST API to start and stop the connector programmatically. Before your load, pause the connector. Once loaded, resume the connector. If you use pause/resume note that you'll still want to set the polling interval appropriately. You could also simply delete & recreate it each time.
You might also consider a log-based CDC approach (pros & cons).
Upvotes: 4