Reputation: 5644
According the pagination docs, you can paginate results by defining a specific table. But what about adding pagination to a query? For example if I have the following query:
client = bigquery.Client(location='US')
job_config = bigquery.QueryJobConfig()
job_config.query_parameters = params
result = client.query(query, job_config=job_config)
How can I paginate this query to get the rows from 10 to 20?
Upvotes: 3
Views: 13837
Reputation: 329
Also you can use .result()
method to iterate data in pages. There is page_size
parameter
query_job = client.query(sql)
result_query = query_job.result(page_size=1000)
for page in result_query.pages
for row in page:
print(row)
Upvotes: 0
Reputation: 589
Note that even if you dont specify an output table, the query automatically makes a table that can be downloaded in pages:
# Wait for the query to complete after which data is stored in a temporary "destination" table:
query_job.result()
page_size = 1000
i = 0
while True:
offset = page_size*i
df = bq_client.list_rows(query_job.destination, max_results=page_size, start_index=offset).to_dataframe()
if df.empty:
break
print(df["Time"][0])
i += 1
I read that the "anonymous" table should not get deleted for at least a few hours, so it is should hopefully survive long enough to download....
With regard putting LIMIT and OFFSET into the SQL, I did try this but it proved vastly slower. I guess that comes down to complexity of the query. For my application the dataset was large, the SQL execution was very streamlined and downloading the data was slow.
Upvotes: 0
Reputation: 419
You can use Big Query JOBS to achieve it in two ways
String query="big query here...";
int startIndex=10;
int maxResults=10;
//fetches rows numbered 10 to 20 from result set
resultCollection = getPaginatedResultCollection(query, startIndex,maxResults);
//NOTE: Do what you want to do with paged data result i.e. resultCollection
/**
* Polls the status of a BigQuery job, returns TableReference to results if
* "DONE"
*/
private static TableReference checkQueryResults(Bigquery bigquery, String projectId, JobReference jobId) throws IOException, InterruptedException {
// Variables to keep track of total query time
while (true) {
Job pollJob = bigquery.jobs().get(projectId, jobId.getJobId()).execute();
if (pollJob.getStatus().getState().equals("DONE")) {
return pollJob.getConfiguration().getQuery().getDestinationTable();
}
// Pause execution for one second before polling job status again,
// to
// reduce unnecessary calls to the BigQUery API and lower overall
// application bandwidth.
// Thread.sleep(1000);
}
}
/**
* @param bigquery
* @param completedJob
* @param startIndex
* @param maxResultsPerPage
* @return
* @throws Exception
*/
private static ResultCollection displayQueryResults(Bigquery bigquery, TableReference completedJob, int startIndex, Integer maxResultsPerPage) throws Exception {
maxResultsPerPage = (maxResultsPerPage==null)? 20:maxResultsPerPage;
JSONObject responseMap = new JSONObject();
List<JSONObject> resultArray = new ArrayList<JSONObject>();
TableDataList queryResult = null;
queryResult = bigquery.tabledata().list(completedJob.getProjectId(), completedJob.getDatasetId(), completedJob.getTableId())
.setMaxResults(new Long(maxResultsPerPage))
.setStartIndex(BigInteger.valueOf(startIndex))
.execute();
//Table table = bigquery.tables().get(completedJob.getProjectId(), completedJob.getDatasetId(), completedJob.getTableId()).execute();
//NOTE: Schema can be read from table.getSchema().getFields()
if (CollectionUtils.isNotEmpty(queryResult.getRows())) {
//NOTE: read result data from queryResult.getRows() and transform the way you want to get them modeled, say resultCollection, for now
}
return resultCollection;
}
Upvotes: 1
Reputation: 446
Do you mean something like this where you pass the limit and offset as parameters to the query?
from google.cloud import bigquery
client = bigquery.Client(project="project")
query = """SELECT name FROM `dataset` LIMIT @limit OFFSET @offset"""
limit = 10
offset = 11
params=[
bigquery.ScalarQueryParameter('limit', 'INT64', limit), bigquery.ScalarQueryParameter('offset', 'INT64', offset)
]
job_config=bigquery.QueryJobConfig()
job_config.query_parameters=params
query_job=client.query(query,job_config=job_config)
for row in query_job:
print("{}".format(row.name))
Upvotes: 0
Reputation: 4746
You didn't post the query, but I guess you're looking for LIMIT 10 OFFSET 10
Upvotes: 10