BigQuery Pagination - Use pageToken or startIndex?

Question

I will fetch approximately 500,000 to 1,000,000 rows in BiqQuery. We will limit it to an offset and max. In this case pageSize = max and startIndex = offset.

Our data will only be processed once a day and then uploaded to BigQuery.

The documentation recommended using pageToken instead of startIndex. I have done some estimation using pageToken and startIndex and could not see any difference in time.

I found one answer here at StackOverflow:

"You should use the page token returned from the original query response or the previous jobs.getQueryResults() call to iterate through pages. This is generally more efficient and reliable than using index-based pagination"

But I'm not convinced why I should use pageToken, then I need to store the token to use it when going back and forth. Timewise, I could not see any difference.

Tamir Klein · Accepted Answer

But I'm not convinced why I should use "pageToken"

There are few but important differences between the two

index-based pagination - Is good when you know how many records are returned from your query and doesn't consider the size of a record (This is important for client-side application
page token - Specific page in the result set not requiring any pre-information to access such as the size of the results

So if in your case you know how many results you have and you don't care about the page size you can use index-based other-wise use page token

BigQuery Pagination - Use pageToken or startIndex?

Answers (1)

Related Questions