Reputation: 97
I am trying to compress files using lz4.frame
and decompress them with their original filenames. for this approach I am trying to write metadata like file filename and filesize before each file.
As we saw in decompression output the filename size is 29742 which is incorrect as compared to the compression output.
Thats how I came to know that why I am facing that error because it is trying to decode utf-8
data, but the data read by archive.read(file_name_length).decode('utf-8')
also contain compressed data. For the first file it works great and successfully decompress the file but for the second time it doesn't work.
For the couple of days I am trying to solve but not getting solution. I hope I am clear with my question. my code
output of compression:
file name length 9
file name file1.txt
file size 762
file name length 9
file name file2.txt
file size 1105
file name length 9
file name file3.txt
file size 1472
output of decompression:
file name length 9
file name file1.txt
file size 762
file name length 29742
Traceback (most recent call last):
File "d:\\Code Playground\\Python\\fileSharing\\fileSharing (2).py", line 250, in \<module\>
decompression("archive.lz4")
File "d:\\Code Playground\\Python\\fileSharing\\fileSharing (2).py", line 166, in decompression
file_name = archive.read(file_name_length).decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 167: invalid continuation byte
Upvotes: 0
Views: 152
Reputation: 97
Ah I see and figured it out how a minor mistake lead to whole destruction.
this is the piece of code which I am using for compression
def compression(files):
'''
take the nested list which contains list of filename and it's size
'''
with open('archive.lz4', 'wb') as archive:
for file_name in files:
# Write file metadata as length-prefixed string to the archive
file_name_length = len(file_name[0])
print("file name length",file_name_length)
archive.write(file_name_length.to_bytes(2, byteorder='big'))
print("file name",file_name[0])
archive.write(file_name[0].encode('utf-8'))
print("file size",file_name[1])
archive.write(int(file_name[1]).to_bytes(4, byteorder='big'))
# Compress the file and write it to the archive
with open(file_name[0], 'rb') as file:
compressed_data = lz4.frame.compress(file.read())
archive.write(compressed_data)
Mistake is that, the file size which I am writing is actual file size (non-compressed) which is obtained by os.path.getsize(filename)
.
The data which is to be decompressed has different size due to its compression. In decompression()
, the file size decoded file_size = int.from_bytes(archive.read(4), byteorder='big')
and then read the compressed data compressed_data = archive.read(file_size)
. here file_size
is not the size of compressed file which cause inappropriate result for the next file.
Upvotes: 0