Ander
Ander

Reputation: 5644

How to paginate results on a Bigquery query

According the pagination docs, you can paginate results by defining a specific table. But what about adding pagination to a query? For example if I have the following query:

    client = bigquery.Client(location='US')
    job_config = bigquery.QueryJobConfig()
    job_config.query_parameters = params
    result = client.query(query, job_config=job_config)

How can I paginate this query to get the rows from 10 to 20?

Upvotes: 3

Views: 13837

Answers (5)

Mark Shot
Mark Shot

Reputation: 329

Also you can use .result() method to iterate data in pages. There is page_size parameter

query_job = client.query(sql)

result_query = query_job.result(page_size=1000)

for page in result_query.pages
    for row in page:
        print(row)

Upvotes: 0

Tunneller
Tunneller

Reputation: 589

Note that even if you dont specify an output table, the query automatically makes a table that can be downloaded in pages:

# Wait for the query to complete after which data is stored in a temporary "destination" table:
    query_job.result()

    page_size = 1000 
    i = 0
    while True:
        offset = page_size*i
        df = bq_client.list_rows(query_job.destination, max_results=page_size, start_index=offset).to_dataframe()
        if df.empty:
            break
        print(df["Time"][0])
        i += 1

I read that the "anonymous" table should not get deleted for at least a few hours, so it is should hopefully survive long enough to download....

With regard putting LIMIT and OFFSET into the SQL, I did try this but it proved vastly slower. I guess that comes down to complexity of the query. For my application the dataset was large, the SQL execution was very streamlined and downloading the data was slow.

Upvotes: 0

Pratap Singh
Pratap Singh

Reputation: 419

You can use Big Query JOBS to achieve it in two ways

String query="big query here...";
    int startIndex=10;
    int maxResults=10;

    //fetches rows numbered 10 to 20 from result set            
    resultCollection = getPaginatedResultCollection(query, startIndex,maxResults);      
    //NOTE: Do what you want to do with paged data result   i.e. resultCollection                           



/**
 * Polls the status of a BigQuery job, returns TableReference to results if
 * "DONE"
 */
private static TableReference checkQueryResults(Bigquery bigquery, String projectId, JobReference jobId) throws IOException, InterruptedException {
    // Variables to keep track of total query time
    while (true) {
        Job pollJob = bigquery.jobs().get(projectId, jobId.getJobId()).execute();
        if (pollJob.getStatus().getState().equals("DONE")) {
            return pollJob.getConfiguration().getQuery().getDestinationTable();
        }
        // Pause execution for one second before polling job status again,
        // to
        // reduce unnecessary calls to the BigQUery API and lower overall
        // application bandwidth.
        // Thread.sleep(1000);
    }
}


/**
 * @param bigquery
 * @param completedJob
 * @param startIndex
 * @param maxResultsPerPage
 * @return
 * @throws Exception 
 */
private static ResultCollection displayQueryResults(Bigquery bigquery, TableReference completedJob, int startIndex, Integer maxResultsPerPage) throws Exception {

    maxResultsPerPage = (maxResultsPerPage==null)? 20:maxResultsPerPage;
    JSONObject responseMap = new JSONObject();
    List<JSONObject> resultArray = new ArrayList<JSONObject>();
    TableDataList queryResult = null;
    queryResult = bigquery.tabledata().list(completedJob.getProjectId(), completedJob.getDatasetId(), completedJob.getTableId())
            .setMaxResults(new Long(maxResultsPerPage))
            .setStartIndex(BigInteger.valueOf(startIndex))
            .execute(); 


        //Table table = bigquery.tables().get(completedJob.getProjectId(), completedJob.getDatasetId(), completedJob.getTableId()).execute();
        //NOTE: Schema can be read from table.getSchema().getFields()
        if (CollectionUtils.isNotEmpty(queryResult.getRows())) {
            //NOTE: read result data from queryResult.getRows() and transform the way you want to get them modeled, say resultCollection, for now
        }

    return resultCollection;
}

Upvotes: 1

Corinne White
Corinne White

Reputation: 446

Do you mean something like this where you pass the limit and offset as parameters to the query?

from google.cloud import bigquery
client = bigquery.Client(project="project")

query = """SELECT name FROM `dataset` LIMIT @limit OFFSET @offset"""

limit = 10
offset = 11

params=[
   bigquery.ScalarQueryParameter('limit', 'INT64', limit), bigquery.ScalarQueryParameter('offset', 'INT64', offset)
]

job_config=bigquery.QueryJobConfig()
job_config.query_parameters=params
query_job=client.query(query,job_config=job_config)

for row in query_job:
    print("{}".format(row.name))

Upvotes: 0

Martin Weitzmann
Martin Weitzmann

Reputation: 4746

You didn't post the query, but I guess you're looking for LIMIT 10 OFFSET 10

Upvotes: 10

Related Questions