user748176
user748176

Reputation:

How to Read-in Binary of a File in Python

In Python, when I try to read in an executable file with 'rb', instead of getting the binary values I expected (0010001 etc.), I'm getting a series of letters and symbols that I do not know what to do with.

Ex: ???}????l?S??????V?d?\?hG???8?O=(A).e??????B??$????????:    ???Z?C'???|lP@.\P?!??9KRI??{F?AB???5!qtWI??8𜐮???!ᢉ?]?zъeF?̀z??/?n??

How would I access the binary numbers of a file in Python?

Any suggestions or help would be appreciated. Thank you in advance.

Upvotes: 2

Views: 6105

Answers (5)

Scott Griffiths
Scott Griffiths

Reputation: 21925

Each character in the string is the ASCII representation of a binary byte. If you want it as a string of zeros and ones then you can convert each byte to an integer, format it as 8 binary digits and join everything together:

>>> s = "hello world"
>>> ''.join("{0:08b}".format(ord(x)) for x in s)
'0110100001100101011011000110110001101111001000000111011101101111011100100110110001100100'

Depending on if you really need to analyse / manipulate things at the binary level an external module such as bitstring could be helpful. Check out the docs; to just get the binary interpretation use something like:

>>> f = open('somefile', 'rb')
>>> b = bitstring.Bits(f)
>>> b.bin
0100100101001001...

Upvotes: 3

rocksportrocker
rocksportrocker

Reputation: 7429

If you realy want to convert the binaray bytes to a stream of bits, you have to remove the first two chars ('0b') from the output of bin() and reverse the result:

with open("settings.dat", "rb") as fp:
    print "".join( (bin(ord(c))[2:][::-1]).ljust(8,"0") for c in fp.read() )

If you use Python prior to 2.6, you have no bin() function.

Upvotes: -2

Chriszuma
Chriszuma

Reputation: 4568

That is the binary. They are stored as bytes, and when you print them, they are interpreted as ASCII characters.

You can use the bin() function and the ord() function to see the actual binary codes.

for value in enumerate(data):
   print bin(ord(value))

Upvotes: 6

dkobozev
dkobozev

Reputation: 2325

Byte sequences in Python are represented using strings. The series of letters and symbols that you see when you print out a byte sequence is merely a printable representation of bytes that the string contains. To make use of this data, you usually manipulate it in some way to obtain a more useful representation.

You can use ord(x) or bin(x) to obtain decimal and binary representations, respectively:

>>> f = open('/tmp/IMG_5982.JPG', 'rb')
>>> data = f.read(10)
>>> data
'\x00\x00II*\x00\x08\x00\x00\x00'

>>> data[2]
'I'

>>> ord(data[2])
73

>>> hex(ord(data[2]))
'0x49'

>>> bin(ord(data[2]))
'0b1001001'

>>> f.close()

The 'b' flag that you pass to open() does not tell Python anything about how to represent the file contents. From the docs:

Append 'b' to the mode to open the file in binary mode, on systems that differentiate between binary and text files; on systems that don’t have this distinction, adding the 'b' has no effect.

Unless you just want to look at what the binary data from the file looks like, Mark Pilgrim's book, Dive Into Python, has an example of working with binary file formats. The example shows how you can read IDv1 tags from an MP3 file. The book's website seems to be down, so I'm linking to a mirror.

Upvotes: 4

FogleBird
FogleBird

Reputation: 76862

Use ord(x) to get the integer value of each byte.

>>> with open('settings.dat', 'rb') as file:
...     data = file.read()
...
>>> for index, value in enumerate(data):
...     print '0x%08x 0x%02x' % (index, ord(value))
...
0x00000000 0x28
0x00000001 0x64
0x00000002 0x70
0x00000003 0x30
0x00000004 0x0d
0x00000005 0x0a
0x00000006 0x53
0x00000007 0x27
0x00000008 0x4d
0x00000009 0x41
0x0000000a 0x49
0x0000000b 0x4e
0x0000000c 0x5f
0x0000000d 0x57
0x0000000e 0x49
0x0000000f 0x4e

Upvotes: 0

Related Questions