Reputation: 870
We have events coming to Kafka and using kafka connect we are syncing these events with aws s3. Data is visible in s3 in below dir structure:
bucket_name/sub_folder/
Partition=0/events.json
Partition=1/events.json
Partition=2/events.json
is there a way to store in below dir structure:
Bucket_name/sub_folder/date=today_date/ events.json or Partition=0..2/date=today/events.json
Bucket_name/sub_folder/date=today_date/ events.json or
Motivation is to store that days events in that that days directory, i searched web but could not find any other way . Thanks in advance.
Upvotes: 0
Views: 266
Reputation: 32090
You can use the TimeBasedPartitioner
which
partitions data according to ingestion time.
e.g. for hourly partioning:
[…]
"partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
"path.format": "'year'=YYYY/'month'=MM/'day'=dd/'hour'=HH",
"locale": "US",
"timezone": "UTC",
"partition.duration.ms": "3600000",
"timestamp.extractor": "RecordField",
"timestamp.field": "my_record_field_with_timestamp_in",
[…]
Upvotes: 2