Event Hubs data to a SQL Data Warehouse (Synapse)

Question

we are trying to integrate Event Hub (EH) data (~ 200MB and 50k messages per minute) to a SQL Data Warehouse (DW) staging area.

So far we have tried to solve this by directly reading the EH data with an Azure Function (AF) and output it to Synapse, but we are reaching the maximum concurrent open sessions of the DW (512 for < DWU500c). We´ve also tried to increase the maxBatchSize, which is read from the EH, in order to decrease the needed sessions on the DW side, but this seems to make the AF quite unstable.

Today I found this tutorial in the Microsoft documentation, which decouples the EH from the AF by using the EH capture feature and Event Grid (EG) to trigger the AF whenever a Blob file has been written. So am I right in assuming that this should drastically decrease the max concurrent open sessions in the DW, since we are talking about much larger batch sizes, when the AF reads the captured Blob files, which can be up to 500MB in size?

Which advantages are there between one solution and the other? Do you have any other best practices to achieve this?

Thx in advance!

Event Hubs data to a SQL Data Warehouse (Synapse)

Answers (1)

Related Questions