Reputation: 23
I need to split a .bin file into chunks. However, I seem to face a problem when it comes to writing the output in the split/new binary file. The output is inconsistent, I can see the data, but there are shifts and gaps everywhere when comparing the split binary with the bigger original one.
def hash_file(filename: str, blocksize: int = 4096) -> str:
blocksCount = 0
with open(filename, "rb") as f:
while True:
#Read a new chunk from the binary file
full_string = f.read(blocksize)
if not full_string:
break
new_string = ' '.join('{:02x}'.format(b) for b in full_string)
split_string = ''.join(chr(int(i, 16)) for i in new_string.split())
#Append the split chunk to the new binary file
newf = open("SplitBin.bin","a", encoding="utf-8")
newf.write(split_string)
newf.close()
#Check if the desired number of mem blocks has been reached
blocksCount = blocksCount + 1
if blocksCount == 1:
break
Upvotes: 1
Views: 361
Reputation: 308216
For characters with ordinals between 0 and 0x7f, their UTF-8 representation will be the same as their byte value. But for characters with ordinals between 0x80 and 0xff, UTF-8 will output two bytes neither of which will be the same as the input. That's why you're seeing inconsistencies.
The easiest way to fix it would be to open the output file in binary mode as well. Then you can eliminate all the formatting and splitting, because you can directly write the data you just read:
with open("SplitBin.bin", "ab") as newf:
newf.write(full_string)
Note that reopening the file each time you write to it will be very slow. Better to leave it open until you're done.
Upvotes: 1