uner
uner

Reputation: 131

BigQuery InternalError loading from Cloud Storage (works with direct file upload)

Whenever I try to load a CSV file stored in CloudStorage into BigQuery, I get an InternalError (both using the web interface as well as the command line). The CSV is (an abbreviated) part of the Google Ngram dataset.

command like:

bq load 1grams.ngrams gs://otichybucket/import_test.csv word:STRING,year:INTEGER,freq:INTEGER,volume:INTEGER

gives me:

BigQuery error in load operation: Error processing job 'otichyproject1:bqjob_r28187461b449065a_000001504e747a35_1': An internal error occurred and the request could not be completed.

However, when I load this file directly using the web interface and the File upload as a source (loading from my local drive), it works.

I need to load from Cloud Storage, since I need to load much larger files (original ngrams datasets).

I tried different files, always the same.

Upvotes: 1

Views: 241

Answers (1)

Jordan Tigani
Jordan Tigani

Reputation: 26637

I'm an engineer on the BigQuery team. I was able to look up your job, and it looks like there was a problem reading the Google Cloud Storage object.

Unfortunately, we didn't log much of the context, but looking at the code, the things that could cause this are:

  1. The URI you specified for the job is somehow malformed. It doesn't look malformed, but maybe there is some odd UTF8 non-printing character that I didn't notice.

  2. The 'region' for your bucket is somehow unexpected. Is there any chance you've set data location on your GCS bucket to something other than {US, EU, or ASIA}. See here for more info on bucket locations. If so, and you've set location to a region, rather than a continent, that could cause this error.

  3. There could have been some internal error in GCS that caused this. However, I didn't see this in any of the logs, and it should be fairly rare.

We're putting in some more logging to detect this in the future and to fix the issue with regional buckets (however, regional buckets may fail, because bigquery doesn't support cross-region data movement, but at least they will fail with an intelligible error).

Upvotes: 2

Related Questions