DarthVader
DarthVader

Reputation: 55112

Azure event hub capture multiple events in a single file

We are planning on using Azure event hub. Our app is sending events to Azure Event hub (one event at a time). App does not specify any partition. We enabled Capture to write the data to Data Lake Storage Gen 2.

Events are written to datalakestorage gen2 as single avro file when capture is enabled. is it possible to write events occurred in a time frame as a single file (csv or avro)? Will Is it better to write each event as a single file or bulk events in a single file?

Upvotes: 0

Views: 753

Answers (1)

Ivan Glasenberg
Ivan Glasenberg

Reputation: 30035

is it possible to write events occurred in a time frame as a single file (csv or avro)?

It depends on how many partitions being used in the eventhub. Each partition captures independently and writes a completed block blob at the time of capture.

So if these events are only sent to 1 partition(for example, your eventhub only has 1 partition or you use your code to control events sent to specified partition), then in a time frame, only 1 avro file is created.

If events are distributed among partitions in a round-robin fashion(this is the default behavior), then in a time frame, the number of avro file created will be same as the number of partitions.

Will Is it better to write each event as a single file or bulk events in a single file?

bulk events in a single file would be better due to less storage cost. But it depends on how many events you're sending during a specified time window or size window for capture. For example, if the time window for capture is 5 minutes, and in these 5 minutes you only send 1 event, then only one file will be created.

Upvotes: 1

Related Questions