Tim S
Tim S

Reputation: 185

Google Cloud datalab error querying BIgQuery tables

I think I am missing something basic here, can't seem to figure out what it is..

Querying BigQuery date partitioned table from Google cloud datalab. Most of the other queries fetches data as expected, not sure why in this particular table, select would not work, however count(1) query works.

%%sql
select * from Mydataset.sample_sales_yearly_part limit 10

I get below error:

KeyErrorTraceback (most recent call last) /usr/local/lib/python2.7/dist-packages/IPython/core/formatters.pyc in
__call__(self, obj)
    305                 pass
    306             else:
--> 307                 return printer(obj)
    308             # Finally look for special method names
    309             method = get_real_method(obj, self.print_method)

/usr/local/lib/python2.7/dist-packages/datalab/bigquery/commands/_bigquery.pyc in _repr_html_query_results_table(results)
    999     1000 def _repr_html_query_results_table(results):
-> 1001   return _table_viewer(results)    1002     1003 

/usr/local/lib/python2.7/dist-packages/datalab/bigquery/commands/_bigquery.pyc in _table_viewer(table, rows_per_page, fields)
    969     meta_time = ''
    970 
--> 971   data, total_count = datalab.utils.commands.get_data(table, fields, first_row=0, count=rows_per_page)
    972 
    973   if total_count < 0:

/usr/local/lib/python2.7/dist-packages/datalab/utils/commands/_utils.pyc in get_data(source, fields, env, first_row, count, schema)
    226     return _get_data_from_table(source.results(), fields, first_row, count, schema)
    227   elif isinstance(source, datalab.bigquery.Table):
--> 228     return _get_data_from_table(source, fields, first_row, count, schema)
    229   else:
    230     raise Exception("Cannot chart %s; unsupported object type" % source)

/usr/local/lib/python2.7/dist-packages/datalab/utils/commands/_utils.pyc in _get_data_from_table(source, fields, first_row, count, schema)
    174   gen = source.range(first_row, count) if count >= 0 else source
    175   rows = [{'c': [{'v': row[c]} if c in row else {} for c in fields]} for row in gen]
--> 176   return {'cols': _get_cols(fields, schema), 'rows': rows}, source.length
    177 
    178 

/usr/local/lib/python2.7/dist-packages/datalab/utils/commands/_utils.pyc in _get_cols(fields, schema)
    108     if schema:
    109       f = schema[col]
--> 110       cols.append({'id': f.name, 'label': f.name, 'type': typemap[f.data_type]})
    111     else:
    112       # This will only happen if we had no rows to infer a schema from, so the type

KeyError: u'DATE'

QueryResultsTable job_Ckq91E5HuI8GAMPteXKeHYWMwMo

Upvotes: 2

Views: 435

Answers (1)

Omar Jarjur
Omar Jarjur

Reputation: 1066

You may be hitting an issue that was just fixed in https://github.com/googledatalab/pydatalab/pull/68 (but not yet included in a Datalab release).

The background is that the new "Standard SQL" support in BigQuery added new datatypes that can show up in the results schema, and Datalab was not yet updated to handle those.

The next release of Datalab should fix this, but in the mean time you can work around it by wrapping your date fields in an explicit cast to TIMESTAMP as part of your query.

For example, if you see that error with the following code cell:

%%sql SELECT COUNT(*) as count, d FROM <mytable>

(where 'd' is a field of type 'DATE'), then you can work around the issue by casting that field to a TIMESTAMP like this:

%%sql SELECT COUNT(*) as count, TIMESTAMP(d) FROM <mytable>

For your particular query, you'll have to change '*' to the list of fields, so that you can cast the one with a date to a timestamp.

Upvotes: 3

Related Questions