lammy
lammy

Reputation: 467

Decompile an imported module (e.g. with uncompyle2)

my task is to export an imported (compiled) module loaded from a container.

I have a Py.-Script importing a module. Upon using print(module1) I can see that it is a compiled python (pyc) file, loaded from an archive. As I cannot access the archive, my idea was to import the module and have it decompiled with uncompyle2.

This is my minimum code:

import os, sys
import uncompyle2
import module1
with open("module1.py", "wb") as fileobj:
uncompyle2.uncompyle_file(module1, fileobj)

However, this prints my an error. If I substitute module1 in the uncompyle argument with the actual path, it does not make a difference. I tried the code snippet successfully when the pyc-file not loaded from a container but rather a single file in a directory and it worked.

Error:

Traceback (most recent call last):
File "C:\....\run.py", line 64, in <module>
  uncompyle2.uncompyle_file(module1, fileobj)
File "C:\....\Python\python-2.7.6\lib\site-packages\uncompyle2\__init__.py", line 124, in uncompyle_file
  version, co = _load_module(filename)
File "C:\.....\Python\python-2.7.6\lib\site-packages\uncompyle2\__init__.py", line 67, in _load_module
  fp = open(filename, 'rb')
TypeError: coercing to Unicode: need string or buffer, module found

Does anyone know where I am going wrong?

Upvotes: 3

Views: 2406

Answers (2)

user4815162342
user4815162342

Reputation: 155416

You are going wrong with your initial assumption:

As I cannot access the archive, my idea was to import the module and have it decompiled with uncompyle2.

Uncompiling an already loaded module is unfortunately not possible. A loaded Python module is not a mirror of the on-disk representation of a .pyc file. Instead, it is a collection of objects created as a side effect of executing the code in the .pyc. Once the code has been executed, its byte code is discarded and it (in the general case) cannot be reconstructed.

As an example, consider the following Python module:

import gtk
w = gtk.Window(gtk.WINDOW_TOPLEVEL)
w.add(gtk.Label("A quick brown fox jumped over the lazy dog"))
w.show_all()

Importing this module inside an application that happens to run a GTK main loop will pop up a window with some text as a side effect. The module will have a dict with two entries, gtk pointing to the gtk module, and w pointing to an already created GTK window. There is no hint there how to create another GTK window of the sort, nor how to create another such module. (Remember that the object created might have been arbitrarily complex and that its creation could be a very involved process.)

You might ask, then, if that is so, then what is the content of the pyc file? How did it get loaded the first time? The answer is that the pyc file contains an on-disk rendition of the byte-compiled code in the module, ready for execution. Creating a pyc file is roughly equivalent to doing something like:

import marshal
def make_pyc(source_code, filename):
    compiled = compile(source_code, filename, "exec")
    serialized = marshal.dumps(compiled)
    with open(filename, "wb") as out:
        out.write(serialized)

# for example:
make_pyc("import gtk\nw = gtk.Window(gtk.WINDOW_TOPLEVEL)...",
         "somefile.pyc", "exec")

On the other hand, loading a compiled module is approximately equivalent to:

import sys, marshal, imp
def load_pyc(modname):
    with open(modname + ".pyc", "rb") as in_:
        serialized = in_.read()
    compiled = marshal.loads(serialized)
    module = sys.modules[modname] = imp.new_module(modname)
    exec compiled in module.__dict__

load_pyc("somefile")

Note how, once the code has been executed with the exec statement, the string and deserialized bytecode is no longer used and will be swept up by the garbage collector. The only remaining effect of the pyc having been loaded is the presence of a new module with living functions, classes, and other objects that are impossible to serialize, such as references to open files, network connections, OpenGL canvases, or GTK windows.

What modules like uncompyle2 do is the inverse of the compile function. You must have the actual code of the module (either serialized as in a pyc file or deserialized code object as in the compiled variable in the snippets above), from which uncompyle2 will produce a fairly faithful representation of the original source.

Upvotes: 1

Padraic Cunningham
Padraic Cunningham

Reputation: 180512

pass the filename string first and then the file object to write to:

with open("out.txt","w") as f:
    uncompyle2.uncompyle_file('path_to.pyc',f)

You can see the output:

with open("/home/padraic/test.pyc","rb") as f:
    print(f.read())
with open("out.txt","r+") as f:
    uncompyle2.uncompyle_file('/home/padraic/test.pyc',f)
    f.seek(0)
    print(f.read())

Output:

�
d�ZdS(cCs   dGHdS(Nshello world((((stest.pytfoosN(R(((stest.pyt<module>s

#Embedded file name: test.py


def foo():
    print 'hello world'

Upvotes: 0

Related Questions