Reputation: 14801
I have a directory with many big files. They have all been created with this line of code:
pickle.dump(variable, gzip.open(file_name, 'wb'), -1)
So they are basically compressed, serialized variables.
Now, at some point, a crash/interruption (or several) might have occurred in the past while executing that exact line. However I just do not know if that happened.
So, first, I am assuming that if something unexpected happened, there is the possibility of having a file_name
in the file system which is corrupted, and does not (at least fully) contain the compressed, serialized variable
. Am I right here?
Now I wonder if there is a way to check the integrity of those files without having to load them to memory one by one. I am trying to avoid executing pickle.load(gzip.open(file_name, 'rb'))
with try/except
.
Is this possible? Is there another (faster) way to check if pickle
and gzip
both finished successfully?
Upvotes: 1
Views: 2170
Reputation: 1
I use the following method in python 2.6. In Python 2.7 you can use with as
try:
f = gzip.open(filepath, 'rb')
f._read_gzip_header()
return True
except Exception, e:
print e
return False
finally:
f.close()
Upvotes: 0
Reputation: 14801
Thanks to @ppperry's answer, I found a solution which is faster than de-serializing everything into memory.
f = gzip.open(file_name, 'rb')
f.seek(-1, os.SEEK_END)
f.read(1) == bytes('.', 'utf8')
Note that:
try/except
)..
.Upvotes: 2
Reputation: 3804
Although I do not think that it is possible to check the validity of a gzip file other than by decompressing it, the pickled data protocol contains a STOP
opcode that should be present at the end of all pickled data. (If it is missing, unpickling will raise an EOFError
). This stop opcode is the .
character. Thus you could partially check the validity of a pickle by checking if it ends with the .
character. This also means that you can concatenate two valid pickles, and then unpickling the result twice will get the two objects. All pickles in protocol two or higher also begin with a \x80
(€
) character.
Upvotes: 2