Michael
Michael

Reputation: 191

Python: Converting a Text File to a Binary File

We can convert any digital file into binary file.

I have a text file of 1MB,

I want to convert it to a binary string and see the output as a binary number and the vice versa,

in other words, if I have binary number, I want to convert it to a text file.

How could I do that in Python? is there a standard way to do this?

Now in this forum there are some posts (1,2,3, 4 ) on this but none of them answer properly to my question.

Upvotes: 3

Views: 7527

Answers (2)

Zimba
Zimba

Reputation: 3673

The "text file" mentioned seems to refer to ASCII file. (where each character takes up 8 bits of space).

2nd line "convert it to a binary string" could mean ASCII representation of the text file, giving a sequences of bytes to be "output as a binary number" (similar to public key cryptography where text is converted to a number before encryption) eg.

text = 'ABC '
for x in text:
  print(format(ord(x), '08b'), end='')

would give binary (number) string: 01000001010000100100001100100000
which in decimal is: 1094861600

The 3rd line would mean to (byte) sequence a binary number & display the equivalent ASCII characters (for each 8-bit sequence) eg. 0x41 to be replaced with 'A' (as output) (The assumption here would be that each number would map to a printable ASCII ie. text character, and the given binary number has a multiple of 8 digits).

eg. To reverse (convert binary number to text):

binary = "01000001010000100100001100100001"
#number of characters in text
num = len(binary)/8 

for x in range(int(num)):
  start = x*8
  end = (x+1)*8
  print (chr(int(str(binary[start:end]),2)), end='')
print()

would give the text: ABC!

For a 1MB text file, you'd split the text string into chunks your machine can handle eg. 32 bits (before converting)

Tested in Python IDE

Upvotes: 1

ELinda
ELinda

Reputation: 2821

See https://docs.python.org/3/library/codecs.html#standard-encodings for a list of standard string encodings, because the conversion depends on the encoding.

These functions will help to convert between bytes/ints and strings, defaulting to UTF-8.

The example provided uses the Hangul character "한" in UTF-8.


def bytes_to_string(byte_or_int_value, encoding='utf-8') -> str:
    if isinstance(byte_or_int_value, bytes):
        return byte_or_int_value.decode(encoding)
    if isinstance(byte_or_int_value, int):
        return chr(byte_or_int_value).encode(encoding).decode(encoding)
    else: 
        raise ValueError('Error: Input must be a bytes or int type')

def string_to_bytes(string_value, encoding='utf-8') -> bytes:
    if isinstance(string_value, str):
        return bytes(string_value.encode(encoding))
    else: 
        raise ValueError('Error: Input must be a string type')

int_value = 54620
bytes_value = b'\xED\x95\x9C'
string_value = '한'

assert bytes_to_string(int_value) == string_value
assert bytes_to_string(bytes_value) == string_value
assert string_to_bytes(string_value) == bytes_value

Upvotes: 0

Related Questions