Howard Felix
Howard Felix

Reputation: 1

Convert python code to byte code to binary and print out the binary in a list

import dis, struct

def func():
    a = 10
    b = 20
    c = a + b
    print(c)

code = dis.Bytecode(func)
packed = struct.pack('B' * code.size, *code)

binary_code = ''
for b in packed:
    binary_code += bin(b)[2:].zfill(8)

list_code = ''
for i in range(0, len(binary_code), 8):
    string_code += chr(int(binary_code[i:i+8], 2))

print(list_code)

Error:

Traceback (most recent call last):
    File "C:\Users\Public\Documents\Pies\Test.py", line 10, in <module>
    packed = struct.pack('B' * code.size, *code)
AttributeError: 'Bytecode' object has no attribute 'size'

When converting to size instead of len I get the opposite code object has no len.

packed = struct.pack('B' * len(code), *code)

TypeError: object of type 'Bytecode' has no len()

Upvotes: 0

Views: 426

Answers (2)

Howard Felix
Howard Felix

Reputation: 1

text = 'Hello World'

ascii_codes = [ord(c) for c in text]

print(ascii_codes)

binary_numbers = [bin(c)[2:] for c in ascii_codes]

print(binary_numbers)

text = ''.join([chr(int(b, 2)) for b in binary_numbers])

print(text)

Upvotes: 0

Karl Knechtel
Karl Knechtel

Reputation: 61519

AttributeError: 'Bytecode' object has no attribute 'size'

Well, yes: as the documentation will tell you, there isn't any .size to access. Similarly, len is also not supported.

This is because the Bytecode that you get back from dis.Bytecode(func) is not a raw sequence of bytes, instructions, or anything else.

It is a wrapper around the function's "code object", which is in turn a wrapper around the actual bytecode.

We don't need or want the wrapper from dis. We can directly access the function's code object:

>>> # Python 2.x
>>> # func.co_code
>>> # Python 3.x
>>> func.__code__
<code object func at 0x7f6b8aaf5d40, file "<stdin>", line 1>

And verify that this is the same thing that the dis module finds:

>>> func.__code__ is dis.Bytecode(func).codeobj
True

To get the actual bytecode from this code object, we need one more step:

>>> func.__code__.co_code
b'd\x01}\x00d\x02}\x01|\x00|\x01\x17\x00}\x02t\x00|\x02\x83\x01\x01\x00d\x00S\x00'

Notice that this is already a bytes object, i.e., the raw bytes of data that would be written into a .pyc file. Therefore, there is no reason to use struct to do any processing. It is already "packed".


If, instead, we want a list of dis.Instruction objects (which represent individual codes in the function's bytecode) - yes, we can "unpack" these using *code; the dis.Bytecode is iterable, even though it doesn't have a .size or a len (and can't be indexed).

In order to work with the actual sequence of Instructions more easily, we can explicitly make a list of them first:

>>> code = dis.Bytecode(func)
>>> # Another way: 
>>> # [*code]
>>> list(code) 
[Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval=10, argrepr='10', offset=0, starts_line=2, is_jump_target=False), Instruction(opname='STORE_FAST', opcode=125, arg=0, argval='a', argrepr='a', offset=2, starts_line=None, is_jump_target=False), Instruction(opname='LOAD_CONST', opcode=100, arg=2, argval=20, argrepr='20', offset=4, starts_line=3, is_jump_target=False), Instruction(opname='STORE_FAST', opcode=125, arg=1, argval='b', argrepr='b', offset=6, starts_line=None, is_jump_target=False), Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='a', argrepr='a', offset=8, starts_line=4, is_jump_target=False), Instruction(opname='LOAD_FAST', opcode=124, arg=1, argval='b', argrepr='b', offset=10, starts_line=None, is_jump_target=False), Instruction(opname='BINARY_ADD', opcode=23, arg=None, argval=None, argrepr='', offset=12, starts_line=None, is_jump_target=False), Instruction(opname='STORE_FAST', opcode=125, arg=2, argval='c', argrepr='c', offset=14, starts_line=None, is_jump_target=False), Instruction(opname='LOAD_GLOBAL', opcode=116, arg=0, argval='print', argrepr='print', offset=16, starts_line=5, is_jump_target=False), Instruction(opname='LOAD_FAST', opcode=124, arg=2, argval='c', argrepr='c', offset=18, starts_line=None, is_jump_target=False), Instruction(opname='CALL_FUNCTION', opcode=131, arg=1, argval=1, argrepr='', offset=20, starts_line=None, is_jump_target=False), Instruction(opname='POP_TOP', opcode=1, arg=None, argval=None, argrepr='', offset=22, starts_line=None, is_jump_target=False), Instruction(opname='LOAD_CONST', opcode=100, arg=0, argval=None, argrepr='None', offset=24, starts_line=None, is_jump_target=False), Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None, argrepr='', offset=26, starts_line=None, is_jump_target=False)]

These objects will not work with struct packing, because again they are wrappers around the raw data (designed to give you more information). They are also not designed to give back the corresponding underlying raw bytes - that will take more work, which involves understanding how the bytecode works at a low level. It will be much easier to work with the function's code object directly


Note also that the contents of the function's bytecode are not enough to be able to recreate the function. For example, some of these instructions are used to load hard-coded data (constants in the code, such as integer literals). These are stored separately. The function object also keeps a reference to its global-variable context.

Upvotes: 2

Related Questions