Reputation: 13
I need to know how many times a number appears in a gzip file with 2912232966 lines , I have the following:
import gzip
from itertools import islice
count=0
f = gzip.open(file,'rb')
for line in f:
lin = line.decode('utf-8')
number = lin[:lin.index('\t')]
if number == '2719708':
conunt+=1
but i get this: 'CRC check failed 0xabc8df68 != 0xba1760acL'
it only works only works up to 400000000 lines, help please
Upvotes: 1
Views: 2525
Reputation: 1007
link to zlib
quote from jiffyclubs answer here
The issue with the gzip module is not that it can't decompress the partial file, the error occurs only at the end when it tries to verify the checksum of the decompressed content. (The original checksum is stored at the end of the compressed file so the verification will never, ever work with a partial file.)
The key is to trick gzip into skipping the verification. The answer by caesar0301 does this by modifying the gzip source code, but it's not necessary to go that far, simple monkey patching will do. I wrote this context manager to temporarily replace gzip.GzipFile._read_eof while I decompress the partial file:
This looks to be exactly what you need....
Go to that link and read the entire respose.
Found by searching google for a stackexchange link to "python gzip crc check failed" first result
Upvotes: 1