Kunal Jain
Kunal Jain

Reputation: 36

Importing data in Bigquery using Datastore export with Schema

I have a datastore export which I want to import in Bigquery specifying a schema for the table.

When I specify the schema I receive the following error.

google.api_core.exceptions.BadRequest: 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/easylr-184605/jobs: Datastore backup imports may not specify a schema.

I understand specifying a schema is not required but I want to do so nonetheless because automatic inference creates numeric columns with RECORD type which requires querying integer and float separately. I want to avoid that completely by changing the datatype to float.

Code Snippet for Importing

job_config = bigquery.LoadJobConfig(
    source_format=bigquery.SourceFormat.DATASTORE_BACKUP,
    write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE,
    schema=schema
)
job = bigquery_client.load_table_from_uri(
    source_file, table_id, job_config=job_config
)
job.result()

while schema is of the form

[
  {
    "description": "field1",
    "mode": "NULLABLE",
    "name": "field1",
    "type": "FLOAT"
  },
  {
    "description": "field2",
    "mode": "NULLABLE",
    "name": "field2",
    "type": "FLOAT"
  }
]

Upvotes: 0

Views: 405

Answers (1)

rmesteves
rmesteves

Reputation: 4075

Setting a schema to a Load Job from a Datastore export is not possible in BigQuery. It means that the schema will always be inferred from the data. If you try to load it through the UI for example, you will see a message saying

Source file defines the schema

In this link you can find how the type conversion works between Datastore and BigQuery.
Apparently Floating-point number is converted to FLOAT and Integer is converted to INTEGER what makes me think that maybe your Datastore fields have another type?

Nevertheless, if you want to change your schema in BigQuery I can suggest you three possible approaches:

  1. Use a View as the final table
  2. Create a scheduled query to read your table when its loaded and save the results in another table with the right schema.
  3. If you are loading the Datastore reports manually and not constantly, just create a new table with the desired schema by querying your table with the wrong schema.

Upvotes: 1

Related Questions