user3911479
user3911479

Reputation: 83

Internal Error while loading Bigquery table

I'm getting this type of error when loading a 1.3Gb json file with 10 million records using bq load --source_format=NEWLINE_DELIMITED_JSON

If I put only the first 1 million records into a separate file, it loads fine, but when I try to run on the full file, I get this:

Current status: PENDING
Waiting on bqjob_r6ac3e4 BigQuery error in load operation: Error processing job 'my-project-prod:bqjob_r6ac3e4da72b48e4f_000001528037b394_1': Too many errors encountered. Limit is: 0. Failure details: - File: 0: An internal error occurred and the request could not be completed.

I've been able to load other large tables but always get this error when I go to load this one. Is there a way to troubleshoot this other than breaking the file into smaller and smaller pieces to try to find the offending line?

(similar to Internal error while loading to Bigquery table)

Upvotes: 1

Views: 910

Answers (3)

Priya Agarwal
Priya Agarwal

Reputation: 512

As per the link : https://cloud.google.com/bigquery/docs/loading-data#limitations

Currently, when you load data into BigQuery, gzip is the only supported file compression type for CSV and JSON files.

As you mentioned that you were trying to load bzip(which is not a supported format) you might be getting the error. Try unzipping the file and loading it, that might help.

Upvotes: 0

Cheng Miezianko
Cheng Miezianko

Reputation: 211

Looking at our logs about your job bqjob_r6ac3e4da72b48e4f_000001528037b394_1, seems like we can not read the first file (maybe other files as well, but it was complaining about the first one).

Is the file gzipped? We've seen similar error in the past when the file is somewhat incorrectly compressed.

Of course it could be other issues. But I don't have enough information right now. It would be helpful if you can share the other failed job id with us. I can help you check in our backend if those import job are failing consistently with file 0. Thanks!

Upvotes: 3

oulenz
oulenz

Reputation: 1244

If you go to the job in BigQuery's web UI, it should show you the first five errors. These may or may not be helpful.

In addition, you could set the maximum number of bad records allowed to a really high number (10,000,000). This way, the offending lines will simply be skipped and you could try to identify them by inspecting the result. (In the Java api, this is the method JobConfigurationLoad.setMaxBadRecords(int), if you're using the command line it's the option --max_bad_records=int).

Upvotes: 0

Related Questions