Reputation: 485
I want to be able to hash itself each time it is run. Is this possible without having to give the path to the script? I can see 2 ways to do this. The first way is to hash the source Python text file. The second way is to hash the compiled bytecode.
I see myself going with choice 2 so that raises a couple of other questions:
Upvotes: 4
Views: 2157
Reputation: 8106
One possible (untested) solution is to use the disassembler module dis.dis()
to convert a python class or module (but not instance) into assembly language. Two identically written classes with different class names will appear identical, but this could be fixed by adding cls.__name__
before running the combined string through md5
Note dis.dis()
prints to stdout rather than returning a string, so there is also the added step of capturing the print output with StringIO
_
_ >>> import dis, md5
_ >>> class A(object):
_ ... def __init__(self, item): print "A(%s)" % item
_ ...
_ >>> dis.dis(A)
_ Disassembly of __init__:
_ 2 0 LOAD_CONST 1 ('A(%s)')
_ 3 LOAD_FAST 1 (item)
_ 6 BINARY_MODULO
_ 7 PRINT_ITEM
_ 8 PRINT_NEWLINE
_ 9 LOAD_CONST 0 (None)
_ 12 RETURN_VALUE
_
_ >>> class B(A):
_ ... def __init__(self, item): super(A, cls).__init__(item); print "B(%s)" % item
_ ...
_ >>> dis.dis(B)
_ Disassembly of __init__:
_ 2 0 LOAD_GLOBAL 0 (super)
_ 3 LOAD_GLOBAL 1 (A)
_ 6 LOAD_GLOBAL 2 (cls)
_ 9 CALL_FUNCTION 2
_ 12 LOAD_ATTR 3 (__init__)
_ 15 LOAD_FAST 1 (item)
_ 18 CALL_FUNCTION 1
_ 21 POP_TOP
_ 22 LOAD_CONST 1 ('B(%s)')
_ 25 LOAD_FAST 1 (item)
_ 28 BINARY_MODULO
_ 29 PRINT_ITEM
_ 30 PRINT_NEWLINE
_ 31 LOAD_CONST 0 (None)
_ 34 RETURN_VALUE
_
_ >>> class Capturing(list):
_ ... def __enter__(self):
_ ... self._stdout = sys.stdout
_ ... sys.stdout = self._stringio = StringIO()
_ ... return self
_ ... def __exit__(self, *args):
_ ... self.extend(self._stringio.getvalue().splitlines())
_ ... del self._stringio # free up some memory
_ ... sys.stdout = self._stdout
_ ...
_ >>> with Capturing() as dis_output: dis.dis(A)
_ >>> A_md5 = md5.new(A.__name__ + "\n".join(dis_output)).hexdigest()
_ '7818f1864b9cdf106b509906813e4ff8'
Upvotes: 1
Reputation: 1123410
A python script can figure out its own path with:
import os
path = os.path.abspath(__file__)
after which you can open the source file and run it through hashlib.md5
.
A script file has no compiled bytecode file; only modules do.
Note that in Python 2, the __file__
path uses the extension of the file that was actually loaded; for modules this is .pyc
or .pyo
only if there was a cached bytecode file ready to be reused. It is .py
if Python had to compile the bytecode, either because no bytecode file was present or because the bytecode file was stale.
You'll have to take into account that your code was invoked with command line switches that alter what bytecode Python loads; if a -O
or -OO
switch is given, or the PYTHONOPTIMIZE
environment flag is set, Python will load or compile to a .pyo
file instead.
Upvotes: 7