urig
urig

Reputation: 16831

Is Azure Data Factory suitable for downloading data from non-Azure REST APIs?

Consider a data processing pipeline as follows:

  1. Fetch a large amount of data from a REST API that's hosted somewhere on the internet and persist it to a data store.
  2. Perform some complex data transformations on the persisted data.
  3. Persist the results of the data transformations on a data store.

Aiming to implement such a pipeline in Azure, steps 2 and 3 seem like a good fit for implementation as Azure Data Factory activities.

My questions is - Does it make sense to implement step 1 in an Azure Data Factory activity as well?

Technically it might be possible to code a .Net activity that perform the data download and persistence.

Upvotes: 0

Views: 433

Answers (3)

genegc
genegc

Reputation: 1702

There have been a lot of improvements to ADF in the years since this question was posted, including a REST connector. Here's the approach recommended by ADF at this time...

Copy data from a REST endpoint by using Azure Data Factory

Upvotes: 1

JustLogic
JustLogic

Reputation: 1738

I have done exactly that using .Net Activity. I had a need to fetch data from Salesforce api. This has been working well for my needs. Here is a post I wrote up about creating a .net activity and storing the data in azure data lake.

As in Newport99's answer yes you will incur costs for that activity but I am not sure how cost effect it would be to be running a separate web app to host a web job and also run the Azure Data Factory pipeline. When I was originally designing a solution the WebJob was my first choice but in the end I prefer to have the whole solution utilizing one azure service instead of multiple.

Hope that helps.

Upvotes: 1

Newport99
Newport99

Reputation: 483

No - do not implement step 1 in an Azure Data Factory activity.

Technically it is possible to run the entire process from ADF but I would argue that the choice is more costly (relatively) than other options available to you because you will pay for each activity in Azure Data Factory.

For instance, what if the rest api does not have any new data to offer when you initiate the (scheduled) activity? You'll pay for that.

You might consider the following as an easy to implement alternative: 1 - Create a .NET console app, publish as a WebJob, schedule to run daily. 2 - The long-running console app can query the rest api, persist data into azure storage / documentdb, push a message into queue which triggers ADF steps 2/3 to run against the saved data.

Upvotes: 1

Related Questions