Reputation: 8198
I am using the bz2.BZ2Decompressor
class to sequentially decompress a stream of bz2 compressed data. It is possible that my stream contains truncated compressed data. I need to be able to differentiate between the case when a complete file was decompressed and when only a portion of it was decompressed. Is there any way to establish that ?
To be more elaborate, my stream of data that I supply to the decompress function may or may not be a complete bz2 compressed file. It may be truncated. When I use this function, it returns to me whatever amount it is able to decompress using the data. It does not tell me if the end of stream was reached. How do I determine the same ? The EOFError
is only raised if there is additional data after the end of stream is found. So that does not help me.
Upvotes: 0
Views: 1377
Reputation: 104712
You can detect if your data stream is complete by passing some extra "junk" data to the decompressor's decompress()
method. If the stream was complete, it will raise an EOFError
. If the stream is still going it will probably not raise an exception, since the decompressor will assume the junk was part of the truncated stream.
Here's some example code:
import bz2
def decompress(stream):
decompressor = bz2.BZ2Decompressor()
# this generator decompresses the data from the iterable stream
results = "".join(decompressor.decompress(data) for data in stream)
# now we test to see if it was a complete BZ2 stream
try:
decompressor.decompress("\0") # try adding a "junk" null character
except EOFError: # the stream was complete!
return results
except IOError: # the junk may cause an IOError if it hits a BZ2 header
pass
# if we reach this point here, the stream was incomplete
print "Incomplete stream!"
return None # you may or may not want to throw away the partial results
Upvotes: 1