Vitor Durante
Vitor Durante

Reputation: 1063

Azure architecture best suited to save JSON from API to a Data Lake Store?

I am looking forward to build an endpoint capable of receiving JSON objects and saving them into ADLS. So far I have tried several different combinations using Functions, Event Hubs, and Stream Analytics. The problem is: no solution so far seems ideal.

TL;DR In my scenario, I have a few set of users that will send me JSON data through an API, and I need to save it inside ADLS, separated by user. What is the best way of doing so?

Could anyone shed me some light? Thanks in advance.

WARNING: LONG TEXT AHEAD

Let me explain my findings so far:

Functions

Advantages

  1. single solution approach - solving the scenario with a single service
  2. built-in authorization
  3. organization - saving user's files to separate folders inside ADLS
  4. HTTP endpoint - to send data only a POST is required
  5. cheap & pay-as-you-go - charged per request

Disadvantages

  1. bindings & dependencies - Functions doesn't have ADLS bindings. To authorize and use ADLS, I need to install extra dependencies and manually manage its credentials. I was only able to do it with C#, but haven't tested with other languages. May also be a drawback, although I can't confirm.
  2. File management - saving 1 file per request is not suggested by ADLS. The alternative would be to append to files and manage its size. This means more code compared to the other solutions.

Event Hub

Advantages

  1. no code at all - all I need is enabling data capture

Disadvantages

  1. one event hub per user - the only way of separating data inside ADLS through event hub's capture capability requires using one event hub per user
  2. price - capturing one-event-hub-per-user increases the prices drastically
  3. authorization - sending events are not as trivial as doing a POST

Functions + Event Hub

Using Event Hub with Functions mitigate Functions disadvantages, but have the same drawbacks (except auth) of Event Hub

Functions + Event Hub + Stream Analytics

Although I would be able to have a single event hub without capture, using Stream Analytics SQL as a filter to direct each user's data to its specific folder, it would be a limiting factor. I have tried it and it gets slower as the SQL gets bigger.

IoT Hub

IoT Hub has routing, but it is not as dynamic as I require.

Could anyone shed me some light? Thanks in advance.

Upvotes: 0

Views: 1311

Answers (1)

silent
silent

Reputation: 16108

I don't quite see the disadvantages of using only Azure Functions to write data to ADLS.

  • As long as you don't write lots of small files, writing 1 file per request should not really be an issue
  • Use the .NET SDK should be pretty straightforward even without an existing binding
  • To solve the authentication piece: Use Managed Service Identity (MSI) and KeyVault to store your client secrets there. MSI support in the SDK is apparently on the roadmap and would then make this very easy indeed.
  • You save yourself the extra cost of an Event Hub and I don't see a real value add through it

Upvotes: 1

Related Questions