Reputation: 906
I am throwing some log files up on Google Big Query and have had a process in place doing it for about 7 months. We've rebuilt the site, so I've made a new table and a nearly identical process. I can bulk upload old files from Google Storage and most files will load without incident. Then when I run the same program in a cronjob, BQ reports a backend error and the data are not loaded.
The file is gzipped and tab-delimited. I'm using the Python gzip package. I believe I've properly preprocessed these files by reading the original, removing all lines that don't have the proper number of fields (476 in this case), then writing and uploading to google storage. The error almost always happens at the end of the file. What's also weird is that I've set a high tolerance for bad rows and have set BQ to read all fields as strings. It's still not loading.
Error loading table: {u'endTime': u'1373914994246',
u'load': {u'inputFileBytes': u'528384',
u'inputFiles': u'1',
u'outputBytes': u'0',
u'outputRows': u'4610'},
u'startTime': u'1373914986420'}
{u'errorResult': {u'location': u'Line:4612 / Field:1',
u'message': u'Error reading source file',
u'reason': u'backendError'},
u'errors': [{u'location': u'Line:4611 / Field:125',
u'message': u'Bad character (ASCII 0) encountered: field starts with: <1373339>',
u'reason': u'invalid'},
{u'location': u'Line:4612 / Field:1',
u'message': u'Error reading source file',
u'reason': u'backendError'}],
u'state': u'DONE'}
I'm downloading the files from FTP, writing to a temporary file. I then open that file with local_file = gzip.open(fname, 'r')
. Then I read it to see if each row is 476 fields, if it's not, I write it elsewhere, if it is, I write it locally. local_file.write(row)
. Then to Google Storage like so:
args = ['python','/path/to/gsutil/gsutil', 'cp', local_file, folder]
call(args)
Upvotes: 0
Views: 488
Reputation: 26637
There was an error decompressing the gzip file. A workaround may be to decompress the file first. I'm still investigating.
Upvotes: 1