Reputation: 4429
I'm trying to read a very large gzipped csv file in node.js. So far, I've been using zlib for this:
file.createReadStream().pipe(zlib.createGunzip()
is the stream I pass to Papa.parse. This works fine for most files, but it fails with a very large gzipped CSV file (250 MB, unzips to 1.2 GB), throwing this error:
Error: incorrect header check
at Zlib.zlibOnError [as onerror] (zlib.js:180:17) {
errno: -3,
code: 'Z_DATA_ERROR'
}
Originally I thought it was the size of the file that caused the error, but now I'm not so sure; maybe it's because the file has been encrypted using a different algorithm. zlib.error: Error -3 while decompressing: incorrect header check suggests passing either -zlib.Z_MAX_WINDOWBITS
or zlib.Z_MAX_WINDOWBITS|16
to correct for that, but I tried it and that's not the problem.
Upvotes: 0
Views: 686
Reputation: 4429
Despite being absolutely sure we had a gzip stream, it turns out we didn't. We got this file from an AWS S3 bucket which contained a lot of versions of this file with different time stamps. For that reason, we selected files based on prefix and loaded only the most recent one.
However, the S3 bucket also contained json files with metadata about these files. It was pure luck that for so long we always got the gzip instead of the json, and recently that luck faltered. So where we always got a gzip file, this time we got a json instead.
The header check error was entirely correct: the file we were looking at was not the gzip file we thought we had, so it didn't have the proper header.
Leaving this answer here instead of removing the question because it's always possible that someone in the future running into this error is absolutely sure they're gunzipping the correct file when they're actually not. Double check which file you're loading.
Upvotes: 1