Manuel Araoz
Manuel Araoz

Reputation: 16416

How do you store raw bytes as text without losing information in python 2.x?

Suppose I have any data stored in bytes. For example:

0110001100010101100101110101101

How can I store it as printable text? The obvious way would be to convert every 0 to the character '0' and every 1 to the character '1'. In fact this is what I'm currently doing. I'd like to know how I could pack them more tightly, without losing information.

I thought of converting bits in groups of eight to ASCII, but some bit combinations are not accepted in that format. Any other ideas?

Upvotes: 1

Views: 7386

Answers (4)

Fire Lancer
Fire Lancer

Reputation: 30145

What about an encoding that only uses "safe" characters like base64?
http://en.wikipedia.org/wiki/Base64

EDIT: That is assuming that you want to safely store the data in text files and such?

In Python 2.x, strings should be fine (Python doesn't use null terminated strings, so don't worry about that).

Else in 3.x check out the bytes and bytearray objects. http://docs.python.org/3.0/library/stdtypes.html#bytes-methods

Upvotes: 7

Tim Swena
Tim Swena

Reputation: 14786

For Python 2.x, your best bet is to store them in a string. Once you have that string, you can encode it into safe ASCII values using the base64 module that comes with python.

import base64
encoded = base64.b64encode(bytestring)

This will be much more condensed than storing "1" and "0".

For more information on the base64 module, see the python docs.

Upvotes: 0

S.Lott
S.Lott

Reputation: 391952

Not sure what you're talking about.

>>> sample = "".join( chr(c) for c in range(256) )
>>> len(sample)
256
>>> sample
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\
x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABC
DEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83
\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97
\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab
\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf
\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3
\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7
\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb
\xfc\xfd\xfe\xff'

The string sample contains all 256 distinct bytes. There is no such thing as a "bit combinations ... not accepted".

To make it printable, simply use repr(sample) -- non-ASCII characters are escaped. As you see above.

Upvotes: 3

pts
pts

Reputation: 87321

Try the standard array module or the struct module. These support storing bytes in a space efficient way -- but they don't support bits directly.

You can also try http://cobweb.ecn.purdue.edu/~kak/dist/BitVector-1.2.html or http://ilan.schnell-web.net/prog/bitarray/

Upvotes: 1

Related Questions