AnkurVj
AnkurVj

Reputation: 8198

Python BZ2 Module Sequential decompressor: How do I find out when the complete file has been successfully decompressed?

I am using the bz2.BZ2Decompressor class to sequentially decompress a stream of bz2 compressed data. It is possible that my stream contains truncated compressed data. I need to be able to differentiate between the case when a complete file was decompressed and when only a portion of it was decompressed. Is there any way to establish that ?

To be more elaborate, my stream of data that I supply to the decompress function may or may not be a complete bz2 compressed file. It may be truncated. When I use this function, it returns to me whatever amount it is able to decompress using the data. It does not tell me if the end of stream was reached. How do I determine the same ? The EOFError is only raised if there is additional data after the end of stream is found. So that does not help me.

Upvotes: 0

Views: 1377

Answers (1)

Blckknght
Blckknght

Reputation: 104712

You can detect if your data stream is complete by passing some extra "junk" data to the decompressor's decompress() method. If the stream was complete, it will raise an EOFError. If the stream is still going it will probably not raise an exception, since the decompressor will assume the junk was part of the truncated stream.

Here's some example code:

import bz2

def decompress(stream):
    decompressor = bz2.BZ2Decompressor()

    # this generator decompresses the data from the iterable stream
    results = "".join(decompressor.decompress(data) for data in stream)

    # now we test to see if it was a complete BZ2 stream
    try:
        decompressor.decompress("\0") # try adding a "junk" null character
    except EOFError: # the stream was complete!
        return results
    except IOError: # the junk may cause an IOError if it hits a BZ2 header
        pass

    # if we reach this point here, the stream was incomplete
    print "Incomplete stream!"
    return None # you may or may not want to throw away the partial results

Upvotes: 1

Related Questions