egotchi
egotchi

Reputation: 47

Loading from Google Cloud Console to Google BigQuery using command-line tool

Until now I was using the BigQuery web tool to load from a backup of my data automatically saved on the Cloud Storage. I'm storing these backups three times a week, on three different buckets depending the week day (monday, wednesday, friday).

The GAE backup tool saves the .backup_info files with such a long name (for example: ahNzfmVnb2xpa2Vwcm9kdWN0aW9uckELEhxfQUVfRGF0YXN0b3JlQWRtaW5fT3BlcmF0aW9uGIrD6wMMCxIWX0FFX0JhY2t1cF9JbmZvcm1hdGlvbhgBDA.entityName.backup_info) and don't know how it is determined or if I can determine easier one. I can only give name to the "output-X-retry-Y" files. Is there any way to change this?

On the other hand, I'm trying the command-line tool, I want to move from the web tool to this one.

I've tried the load command but don't know how to automatically generate the schema from the backup, the same way I'm doing it from the web tool on the 'specify schema' section.

I'm always taking an error because of not specifying the schema trying this format:

bq load dataset.table gs://path

Is it possible to not determine the schema, the same way I'm not doing it on the web tool?

Upvotes: 2

Views: 384

Answers (1)

Jordan Tigani
Jordan Tigani

Reputation: 26637

If you're running bq load to import a GAE datastore backup, you should add the --source_format=DATASTORE_BACKUP flag. Note you need to add that flag after load but before the table name:

bq load --source_format=DATASTORE_BACKUP dataset.table gs://path

That will tell BigQuery that you're loading from a datastore backup, which has a self-describing schema.

As far as I know, there isn't a way to control the generated name of the datastore backup.

Upvotes: 5

Related Questions