starflyer
starflyer

Reputation: 485

Is it possible for a Python script to md5 hash itself?

I want to be able to hash itself each time it is run. Is this possible without having to give the path to the script? I can see 2 ways to do this. The first way is to hash the source Python text file. The second way is to hash the compiled bytecode.

I see myself going with choice 2 so that raises a couple of other questions:

  1. Can a script determine where its compiled bytecode is from within the script?
  2. I'll ask this in a separate question.

Upvotes: 4

Views: 2157

Answers (2)

James McGuigan
James McGuigan

Reputation: 8106

One possible (untested) solution is to use the disassembler module dis.dis() to convert a python class or module (but not instance) into assembly language. Two identically written classes with different class names will appear identical, but this could be fixed by adding cls.__name__ before running the combined string through md5

Note dis.dis() prints to stdout rather than returning a string, so there is also the added step of capturing the print output with StringIO

_

_ >>> import dis, md5
_ >>> class A(object): 
_ ...   def __init__(self, item): print "A(%s)" % item
_ ... 
_ >>> dis.dis(A)
_ Disassembly of __init__:
_   2           0 LOAD_CONST               1 ('A(%s)')
_               3 LOAD_FAST                1 (item)
_               6 BINARY_MODULO       
_               7 PRINT_ITEM          
_               8 PRINT_NEWLINE       
_               9 LOAD_CONST               0 (None)
_              12 RETURN_VALUE        
_ 
_ >>> class B(A):
_ ...   def __init__(self, item): super(A, cls).__init__(item); print "B(%s)" % item
_ ... 

_ >>> dis.dis(B)
_ Disassembly of __init__:
_   2           0 LOAD_GLOBAL              0 (super)
_               3 LOAD_GLOBAL              1 (A)
_               6 LOAD_GLOBAL              2 (cls)
_               9 CALL_FUNCTION            2
_              12 LOAD_ATTR                3 (__init__)
_              15 LOAD_FAST                1 (item)
_              18 CALL_FUNCTION            1
_              21 POP_TOP             
_              22 LOAD_CONST               1 ('B(%s)')
_              25 LOAD_FAST                1 (item)
_              28 BINARY_MODULO       
_              29 PRINT_ITEM          
_              30 PRINT_NEWLINE       
_              31 LOAD_CONST               0 (None)
_              34 RETURN_VALUE        
_ 
_ >>> class Capturing(list):
_ ...     def __enter__(self):
_ ...         self._stdout = sys.stdout
_ ...         sys.stdout = self._stringio = StringIO()
_ ...         return self
_ ...     def __exit__(self, *args):
_ ...         self.extend(self._stringio.getvalue().splitlines())
_ ...         del self._stringio    # free up some memory
_ ...         sys.stdout = self._stdout
_ ... 
_ >>> with Capturing() as dis_output: dis.dis(A)
_ >>> A_md5 = md5.new(A.__name__ + "\n".join(dis_output)).hexdigest()
_ '7818f1864b9cdf106b509906813e4ff8'

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1123410

A python script can figure out its own path with:

import os

path = os.path.abspath(__file__)

after which you can open the source file and run it through hashlib.md5.

A script file has no compiled bytecode file; only modules do.

Note that in Python 2, the __file__ path uses the extension of the file that was actually loaded; for modules this is .pyc or .pyo only if there was a cached bytecode file ready to be reused. It is .py if Python had to compile the bytecode, either because no bytecode file was present or because the bytecode file was stale.

You'll have to take into account that your code was invoked with command line switches that alter what bytecode Python loads; if a -O or -OO switch is given, or the PYTHONOPTIMIZE environment flag is set, Python will load or compile to a .pyo file instead.

Upvotes: 7

Related Questions