Reputation: 13
I submitted a load job to Google BigQuery which loads 12 compressed (gzip) tabular files from google cloud storage. Each file is about 2 gigs compressed. The command I ran was similar to:
bq load --nosync --skip_leading_rows=1 --source_format=CSV
--max_bad_records=14000 -F "\t" warehouse:some_dataset.2014_lines
gs://bucket/file1.gz,gs://bucket/file2.gz,gs://bucket/file12.gz
schema.txt
I'm receiving the following error from my BigQuery load job with no explanation of why:
Error Reason:internalError. Get more information about this error at Troubleshooting Errors: internalError.
Errors: Unexpected. Please try again.
I'm certain that the schema file is correctly formatted as I've successfully loaded files using the same schema but different set of files.
I'm wondering in what kinds of situation would an internal error like this occur and what are some ways I could go about debugging this issue?
My BQ job id: bqjob_r78ca777a8ad4bdd9_0000014e2dc86e0e_1
Thank you!
Upvotes: 1
Views: 498
Reputation: 239
There are some cases you can get into with large .gz input files that are not always reported with a clear cause. This can happen especially (but not exclusively) with highly compressible text, so that 1 GB of compressed data represents an unusually large amount of text.
The documented limit on this page for compressed CSV/JSON is 1 GB. If that is current, I would actually expect an error on your 2 GB input. Let me check that.
Are you able to split these files into smaller pieces and try again?
(Meta: Grace, you are correct that Google says that "Google engineers monitor and answer questions with the tag google-bigquery" on StackOverflow. I am a Google engineer, but there are also many knowledgeable people here who are not. Google's docs could perhaps give more explicit guidance: the questions that are most valuable to the StackOverflow community are ones that a future person can identify they're seeing this same problem, and preferably that a non-Googler can answer it from public information. It's tough in your case because the error is broad and the cause is unclear. But if you're able to reproduce the problem using an input file that you can make public, more people here will be able to take a crack at the problem. You can also file an issue for questions that really no one outside Google can do much with.)
Upvotes: 1