Reputation: 870
I am trying to read a zip file (in python 2.7.2) by reading each of the bytes. I am able to get through the local file headers and the data. However I am stuck when trying to read the Central file header.
This helps alot http://en.wikipedia.org/wiki/File:ZIP-64_Internal_Layout.svg
I dont know how to find out how many items there are in the archive so I can switch to formating the central file header or how else to know how to switch from formating file to the central file header.
This is what I have right now -
import sys
def main(debug=0,arg_file=''):
if debug==2:
print "- Opening %s" % arg_file
with open(arg_file) as archive:
if debug==2:
print "- Reading %s" % arg_file
bytes = archive.read()
if debug==2:
print "-------------Binary-------------"
print bytes
#Read file headers
end = 0
while end != bytes.__len__():
print end
end = process_sub_file(debug,end,bytes)
def process_sub_file(debug,startbytes, bytes):
header = bytes[startbytes + 0] + bytes[startbytes + 1] + bytes[startbytes + 2] + bytes[startbytes + 3]
version = bytes[startbytes + 4] + bytes[startbytes + 5]
flags = bytes[startbytes + 6] + bytes[startbytes + 7]
comp_method = bytes[startbytes + 8] + bytes[startbytes + 9]
mod_time = bytes[startbytes + 10] + bytes[startbytes + 11]
mod_date = bytes[startbytes + 12] + bytes[startbytes + 13]
crc = bytes[startbytes + 14] + bytes[startbytes + 15] + bytes[startbytes + 16] + bytes[startbytes + 17]
comp_size_bytes = bytes[startbytes + 18] + bytes[startbytes + 19] + bytes[startbytes + 20] + bytes[startbytes + 21]
comp_size = ord(comp_size_bytes[0]) + ord(comp_size_bytes[1]) + ord(comp_size_bytes[2]) + ord(comp_size_bytes[3])
uncomp_size_bytes = bytes[startbytes + 22] + bytes[startbytes + 23] + bytes[startbytes + 24] + bytes[startbytes + 25]
uncomp_size = ord(uncomp_size_bytes[0]) + ord(uncomp_size_bytes[1]) + ord(uncomp_size_bytes[2]) + ord(uncomp_size_bytes[3])
name_len_bytes = bytes[startbytes + 26] + bytes[startbytes + 27]
name_len = int(ord(name_len_bytes[0])+ord(name_len_bytes[1]))
extra_len_bytes = bytes[startbytes + 28] + bytes[startbytes + 29]
extra_len = int(ord(extra_len_bytes[0])+ord(extra_len_bytes[1]))
file_name = ""
for i in range(name_len):
file_name = file_name + bytes[startbytes + 30 + i]
extra_field = ""
for i in range(extra_len):
file_name = file_name + bytes[startbytes + 30 + name_len + i]
data = ""
for i in range(comp_size):
data = data + bytes[startbytes + 30 + name_len + extra_len + i]
if debug>=1:
print "-------------Header-------------"
print "Header Signature: %s" % header
print "Version: %s" % version
print "Flags: %s" % flags
print "Compression Method: %s" % comp_method
print "Modification Time: %s" % (ord(mod_time[0]) + ord(mod_time[1]))
print "Modification Date: %s" % (ord(mod_date[0]) + ord(mod_time[1]))
print "CRC-32: %s" % crc
print "Compressed Size: %s" % comp_size
print "Uncompressed Size: %s" % uncomp_size
print "File Name Length: %s" % name_len
print "Extra Field Length: %s" % extra_len
print "File Name: %s" % file_name
print "Extra Field: %s" % extra_field
print "Data:\n%s" % data
return startbytes + 30 + name_len + extra_len + comp_size
Upvotes: 1
Views: 1301
Reputation: 5638
You want to search through the file backwards for the "End of Central Directory" block. It contains the total number of entries in the central directory.
Search for "End of central directory record:" in: http://www.pkware.com/documents/casestudies/APPNOTE.TXT
If the total number of entries in the central directory = 0xffff, then you have to search for the "Zip64 End of Central Directory" block which is located directly before the "End of Central Directory" block. And in that case the Zip64 block would contain the actual number of entries in the central directory for the zip file.
The "EofCD" block contains the offset to the start of the central directory which you can then go to, to begin iterating through all the file header blocks in the entire central directory.
Upvotes: 1