BigQuery Internal Error with `pageToken` when running in GCP

Question

I run into this error with BigQuery:

"An internal error occurred and the request could not be completed. This is usually caused by a transient issue. Retrying the job with back-off as described in the BigQuery SLA should solve the problem: https://cloud.google.com/bigquery/sla. If the error continues to occur please contact support at https://cloud.google.com/support. Error: 5034996"

Two application use the same way with pageToken to paginate trough big result sets.

run query with initital startIndex: 0, maxResults: 10
get result together with pageToken
send to client
... some time may pass ...
request "next page": use pageToken together with maxResults: 10 to get the next result
repeat from 3.

NodeJS 16, @google-cloud/bigquery 6.0.3

Locally (Windows 10), for both application every thing works, pagination with pageToken returns results quite fast (<5s). All steps 1 to 6 and requesting multiple next pages one after another works, even tested that the pageToken still works after 60min+.

Production Cloud has problems: the initial query works always, but as soon as a pageToken is given, the query fails after ~15s+, even when "requested the next page directly (1-5s. delay) after getting the first page". Steps 1 to 3 work, but requesting next page fails nearly most time, it's very rare that it doesn't fail.

Production uses Google Cloud Functions and Google Cloud Run to serve the applications.

One application is an internal experiment, this application uses the same dataset + table when running locally and when running in "production".

The other application uses the same dataset but different tables for local/production - and is in another Google Cloud project than the first application.

Thus project-level quotas or e.g. different table setups locally/prod shouldn't cause the issue here (hopefully).

Example code used:

const [rows, , jobQueryResults] = await (job.getQueryResults(
    ('maxResults' in paginate ?
        {
            // there seems to be no need to give the startIndex again / but tested it also with giving a stable `0` index; atm. as soon as a pageToken is given the `startIndex` is omitted
            startIndex: paginate.pageToken ? undefined : paginate.startIndex,
            maxResults: paginate.maxResults,
            pageToken: paginate.pageToken,
        } : undefined) as QueryResultsOptions,
) as Promise)

What wonders me is that the pageToken isn't shown in the log of the failure, the maxResults is visible:

Edit

The error suggests some SLA problem, one of the GCP projects only include experimentals (non public) applications, thus any traffic/usage can be easily monitored.

The monitoring for BigQuery in that project shows roughly 1 job per 1 second when testing it, job 1+2 where "load without pageToken" -> 3 used the pageToken from 2 and run into an error, the retries must happen from BigQuery side, there is nothing implemented from my side (using only the official BigQuery package).

BigQuery Internal Error with `pageToken` when running in GCP

Edit

Answers (1)

Related Questions