Reputation: 109
I have a long array of items (4700) that will ultimately be 1 or 0 when compared to settings in another list. I want to be able to construct a single integer/string item that I can store in some of the metadata such that it can be accessed later in order to uniquely identify the combination of items that goes into it.
I am writing this all in Python. I am thinking of doing something like zlib compression plus a hex conversion, but I am getting myself confused as to how to do the inverse transformation. So assuming bin_string is the string array of 1's and 0's it should look something like this
import zlib
#example bin_string, real one is much longer
bin_string="1001010010100101010010100101010010101010000010100101010"
compressed = zlib.compress(bin_string.encode())
this_hex = compressed.hex()
where I can then save this_hex to the metadata. The question is, how do I get the original bin_string
back from my hex value? I have lots of Python experience with numerical methods and such but little with compression, so any basic insights would be very valuable.
Upvotes: 0
Views: 3012
Reputation: 112374
Just do the inverse of each operation. This:
zlib.decompress(bytearray.fromhex(this_hex)).decode()
will return your original string.
It would be faster and might even result in better compression to simply encode your bits as bits in a byte string, along with a terminating one bit followed by zeros to pad out the last byte. That would be seven bytes instead of the 22 you're getting from zlib.compress()
. zlib would do better only if there is a strong bias for 0's or 1's, and/or there are repeating patterns in the 0's and 1's.
As for encoding for the metadata, Base64 would be more compact than hexadecimal. Your example would be lKVKVKoKVQ==
.
Upvotes: 2
Reputation: 176
You should try using the .savez_compressed() method of numpy
Convert your simple array into a numpy array amd then use this -
numpy.savez_compressed("filename.npz")
Use
numpy.load()
To load the .npz file
Upvotes: 1