br.nz
br.nz

Reputation: 45

Offset in API calls in Azure Data Factory

I am using ADF to retrieve data from a REST API and loading into an SQL DB. The problem I have is that the endpoints in the API return a max of 50 records per page.

I'm trying to use the built-in pagination in ADF by setting Pagination rules with QueryParameters, {offset} using range and the max value.

Some endpoints in the API does return total rows in the call. So I could possibly use this somehow to make the Copy Data activity dynamic.

The relative URL i use is:

@concat(
pipeline().parameters.relative_url,
'?offset={offset}'
)

And the {offset} in Pagination Rules is set up as:

@concat(
'RANGE:0:',
activity('LookUp_Total').output.firstRow.Total,
':50')

The problem I encounter is with endpoints where the total is larger than 50 I get duplicates. So if the total is for example 76 I still get 100 records.

Upvotes: 0

Views: 2158

Answers (2)

Douglas Wiley
Douglas Wiley

Reputation: 523

Assuming this type of pagination object in your payload:

pagination info in your payload

I implemented it using a foreach that iterates through a paging array. Here is the general structure:

example pipeline

Let's discuss each activity:

  1. get oauth token - retrieve the bearer token required for API authentication.

  2. first call for paging info - call the api specifically to retrieve the first pagination object.

  3. create paging array - for the Name field: create a Pipeline variable of type Array called paging_array. For the Value field: use the @range function to build the array, representing the number of calls that need to be made to the API based on the limit and total number of records to be retrieved:

@range(
    1,
    add(
        div(activity('first call for paging info').output.pagination.total, activity('first call for paging info').output.pagination.limit), 
        if(
            equals(mod(activity('first call for paging info').output.pagination.total, activity('first call for paging info').output.pagination.limit), 0),
            0,
            1
        )
    )
)
  1. foreach paging array item - the Items field simply references the paging_array created earlier. Consider setting a Batch count if your API has throttling requirements.

foreach activity Items

  1. call using item from paging array (Web activity within Foreach) - I've used a Web activity here as an example, but you may perhaps use a Copy Activity to integrate into your data warehouse or move to a storage account, etc. The important thing is that the source URL specifies the offset using the current Foreach item:
https://your.excellent.api/v1/users?limit=100,offset=@{item()} 

Watch your batch count so your API is not overwhelmed, especially in the scenario where your doing an initial full load and then subsequent deltas...

Thank you!

Douglas

Upvotes: 0

RajkumarPalnati
RajkumarPalnati

Reputation: 689

  • You can use Offset Pagination, and for that it requires two variables
    i.e, a limit and an offset.
  • limit is for the number of records to return.
  • offset is for the number of rows to skip first.
  • So, it needs you to dynamically alter the offset value, and gradually increasing till records are empty.

You can walk through the full syntax and code here.

SO - Azure Data Factory Pagination with offset.

Upvotes: 0

Related Questions