andryushka x
andryushka x

Reputation: 35

how does the confluent s3 source connector know which files it has already ingested and which ones are new?

https://docs.confluent.io/kafka-connect-s3-source/current/

I think this connector polls s3 for a list of files -- but does it keep state about which ones it has processed and which ones are new? If it does store state, where is the state stored?

Upvotes: 1

Views: 541

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191738

In general, source connectors store state within the configured config.offsets.topic, and while I've not used this particular connector, I imagine it would have to depend on a monotonically increasing S3 key, such as those written by the corresponding S3 sink, and therefore shouldn't be expected to work for any random S3 bucket

There's some details about the regular file source connector in this post

Upvotes: 0

Related Questions