Richard
Richard

Reputation: 65600

"Response too large to return" on simple SELECT in BigQuery, even with allowLargeResults=True?

I am using BigQuery with Python. I am trying to work out how to run a simple SELECT query, but I am getting errors about large results.

I have tested my query in the BigQuery interface before writing it in Python. It runs fine, returns 1 row, takes 4.0 seconds and processes 18.2GB. The underlying table is about 150GB, 200m rows.

This is my code:

credentials = GoogleCredentials.get_application_default()
bigquery_service = build('bigquery', 'v2', credentials=credentials)
try:
    query_request = bigquery_service.jobs()
    query_data = {
        "allowLargeResults": True,
        'query': (
            'SELECT org_code, item_code FROM [mytable] ',
            "WHERE (time_period='201501') ",
            "AND item_code='0212000AAAAAAAA' ",
            "AND (org_code='B82005') "
            "LIMIT 10;"
        )
    }
    print ' '.join(query_data['query'])
    response = query_request.query(
        projectId=project_id,
        body=query_data).execute()
    job_ref = response['jobReference']
    print 'job_ref', job_ref

except HttpError as err:
    print('Error: {}'.format(err.content))
    raise err

This is the output I get:

SELECT org_code, item_code FROM [mytable]  WHERE (time_period='201501')  AND (item_code='0212000AAAAAAAA')  AND (org_code='B82005') LIMIT 10;
Error: {
 "error": {
  "errors": [
   {
    "domain": "global",
    "reason": "responseTooLarge",
    "message": "Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors"
   }
  ],
  "code": 403,
  "message": "Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors"
 }
}

Traceback (most recent call last):
  File "query.py", line 93, in <module>
    main(args.project_id)
  File "query.py", line 82, in main
    raise err
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/bigquery/v2/projects/824821804911/queries?alt=json returned "Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors">

There are a couple of different things that confuse me about this:

  1. It says I should use allowLargeResults, even though I already am.
  2. It's giving me a warning about large results, although this is a straightforward SELECT query with no grouping, and it returns 1 row.

I understand that the warning will fire if any part of the query processing becomes too large. But I don't really know how to get round this, given the query I'm doing is just a SELECT with no grouping etc. I'm not even using SELECT *.

Surely the whole point of BigQuery is that it can handle this kind of thing?

How can I fix this problem?

Upvotes: 2

Views: 7714

Answers (5)

chans
chans

Reputation: 497

I had a same issue. I solved it by using job.insert() instead of job.query(). Specify true for allowLargeResults. Also give destinationTable for query.

Here is the sample code:

job_data = {
"jobReference": {
  "projectId": "project_id"
},
"configuration": {
  "query": {
     "query": "query",
     "allowLargeResults": "True",
     "destinationTable": {
        "projectId": "project_id",
        "tableId": "table_name",
        "datasetId": "dataset_name"
     }
  }
}
}

return bigquery.jobs().insert(
    projectId="project_id",
    body=job_data).execute()

Upvotes: 0

Jordan Tigani
Jordan Tigani

Reputation: 26637

I took a look at your job; you're not setting allowLargeResults, and you're also not using a limit or a filter (your query was essentially just selecting two fields from the table).

There are two ways to run a query in the BigQuery API. The first is to call the jobs.query(). This is the 'simple' way, but it is lacking some bells and whistles. The other is to call jobs.insert() with a query job configuration. This has full support for things like setting a destination table and allowing large results.

It looks like you are calling the former (jobs.query()), but you want jobs.insert().

It is easier than it sounds to use the more fully-fledged jobs.insert() call. You can get the job id back from the jobs.insert() call, and then pass that to jobs.getQueryResults() to get the query results; the format of the results returned by that method is the same as calling jobs.query(). Check out the sample code here.

Upvotes: 2

Dan Delorey
Dan Delorey

Reputation: 161

Is [mytable] possibly a view rather than a table?

Upvotes: 0

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173191

If configuration.query.allowLargeResults is set to true - it also requires configuration.query.destinationTable

You should either add destinationTable object or (as your output seems to be small) set allowLargeResults to false

Added example of configuration:

'query': {
    'query': 'my_query_text',
    'destinationTable': {
        'projectId': 'my_project',
        'datasetId': 'my_dataset',
        'tableId': 'my_table'
    },
    'createDisposition': 'CREATE_IF_NEEDED',
    'writeDisposition': 'WRITE_TRUNCATE',
    'allowLargeResults': True
}

Upvotes: 3

Pentium10
Pentium10

Reputation: 208042

Let's clear some things which are wrong here.

Queries that return large results are subject to additional limitations:

  • You must specify a destination table.
  • You can't specify a top-level ORDER BY, TOP or LIMIT clause. Doing so negates the benefit of using allowLargeResults, because the query output can no longer be computed in parallel.
  • Window functions can return large query results only if used in conjunction with a PARTITION BY clause.

The documentation is clear about configuration.query.allowLargeResults If true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requires destinationTable to be set.

Upvotes: 0

Related Questions