Reputation: 65600
I am using BigQuery with Python. I am trying to work out how to run a simple SELECT
query, but I am getting errors about large results.
I have tested my query in the BigQuery interface before writing it in Python. It runs fine, returns 1 row, takes 4.0 seconds and processes 18.2GB. The underlying table is about 150GB, 200m rows.
This is my code:
credentials = GoogleCredentials.get_application_default()
bigquery_service = build('bigquery', 'v2', credentials=credentials)
try:
query_request = bigquery_service.jobs()
query_data = {
"allowLargeResults": True,
'query': (
'SELECT org_code, item_code FROM [mytable] ',
"WHERE (time_period='201501') ",
"AND item_code='0212000AAAAAAAA' ",
"AND (org_code='B82005') "
"LIMIT 10;"
)
}
print ' '.join(query_data['query'])
response = query_request.query(
projectId=project_id,
body=query_data).execute()
job_ref = response['jobReference']
print 'job_ref', job_ref
except HttpError as err:
print('Error: {}'.format(err.content))
raise err
This is the output I get:
SELECT org_code, item_code FROM [mytable] WHERE (time_period='201501') AND (item_code='0212000AAAAAAAA') AND (org_code='B82005') LIMIT 10;
Error: {
"error": {
"errors": [
{
"domain": "global",
"reason": "responseTooLarge",
"message": "Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors"
}
],
"code": 403,
"message": "Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors"
}
}
Traceback (most recent call last):
File "query.py", line 93, in <module>
main(args.project_id)
File "query.py", line 82, in main
raise err
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/bigquery/v2/projects/824821804911/queries?alt=json returned "Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors">
There are a couple of different things that confuse me about this:
allowLargeResults
, even though I already am. SELECT
query with no grouping, and it returns 1 row. I understand that the warning will fire if any part of the query processing becomes too large. But I don't really know how to get round this, given the query I'm doing is just a SELECT
with no grouping etc. I'm not even using SELECT *
.
Surely the whole point of BigQuery is that it can handle this kind of thing?
How can I fix this problem?
Upvotes: 2
Views: 7714
Reputation: 497
I had a same issue. I solved it by using job.insert() instead of job.query(). Specify true for allowLargeResults. Also give destinationTable for query.
Here is the sample code:
job_data = {
"jobReference": {
"projectId": "project_id"
},
"configuration": {
"query": {
"query": "query",
"allowLargeResults": "True",
"destinationTable": {
"projectId": "project_id",
"tableId": "table_name",
"datasetId": "dataset_name"
}
}
}
}
return bigquery.jobs().insert(
projectId="project_id",
body=job_data).execute()
Upvotes: 0
Reputation: 26637
I took a look at your job; you're not setting allowLargeResults, and you're also not using a limit or a filter (your query was essentially just selecting two fields from the table).
There are two ways to run a query in the BigQuery API. The first is to call the jobs.query()
. This is the 'simple' way, but it is lacking some bells and whistles. The other is to call jobs.insert()
with a query job configuration. This has full support for things like setting a destination table and allowing large results.
It looks like you are calling the former (jobs.query()
), but you want jobs.insert()
.
It is easier than it sounds to use the more fully-fledged jobs.insert()
call. You can get the job id back from the jobs.insert()
call, and then pass that to jobs.getQueryResults()
to get the query results; the format of the results returned by that method is the same as calling jobs.query()
. Check out the sample code here.
Upvotes: 2
Reputation: 173191
If configuration.query.allowLargeResults
is set to true - it also requires configuration.query.destinationTable
You should either add destinationTable object or (as your output seems to be small) set allowLargeResults to false
Added example of configuration:
'query': {
'query': 'my_query_text',
'destinationTable': {
'projectId': 'my_project',
'datasetId': 'my_dataset',
'tableId': 'my_table'
},
'createDisposition': 'CREATE_IF_NEEDED',
'writeDisposition': 'WRITE_TRUNCATE',
'allowLargeResults': True
}
Upvotes: 3
Reputation: 208042
Let's clear some things which are wrong here.
Queries that return large results are subject to additional limitations:
The documentation is clear about configuration.query.allowLargeResults
If true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requires destinationTable to be set.
Upvotes: 0