Reputation: 3
I've been trying to make a program that can extract all the jpeg files in a selected disk image. I know there are 7 JPEG files in the disk image I'm testing it on and yet the code I made can only extract 2 of them. I'd like to ask what I might be doing wrong to be cause this to happen.
#!/usr/bin/python
import sys
from binascii import hexlify
def main():
filename = 'disk.img'
i = 1
f = open(filename, 'rb')
for data in iter(lambda:f.read(4), ""):
if (data == '\xff\xd8\xff\xe1' or data == '\xff\xd8\xff\xe0'):
print data.encode('hex')
print f.tell()
while(data != '\xff\xd9'):
new_filename = "%03d.jpg" % i
newfile = open(new_filename, 'ab')
newfile.write(data)
data = f.read(2)
newfile.close()
print "%03d.jpg extracted!" % i
i = i+1
#position = f.tell()
#f.seek(position+16)
f.close()
print "EOF"
if __name__ == '__main__':
main()
Upvotes: 0
Views: 1702
Reputation: 33993
There are existing tools for that. See http://www.cgsecurity.org/wiki/PhotoRec
I suppose the problem with the sample code is that it reads (2|4) bytes at a time and when a JPEG doesn't start at a position which is dividable by (two|four), you won't find it. (two or four depending on the loop we're in)
Upvotes: 1