Nguyen Diep
Nguyen Diep

Reputation: 85

How to mock MD5 hash collision

I am using a private Ubuntu server and am testing a private application

I am using the Python hashlib library for generating MD5 hashes.

Now I want the MD5 function always return my specific number although different input. How can I do this? Is there a config file?

Upvotes: 0

Views: 2321

Answers (2)

techydesigner
techydesigner

Reputation: 1691

tl;dr You can't, unless you write your own function or implement a monkey patch.

Hashes are not designed to return the same value for multiple completely different snippets of data (although inevitably there have been collisions, like with MD5, due to the length of the hash). You could write your own function to check the value passed, and have it return a unique value if you need to. An example:

import hashlib

def my_func(thing):
    hash_for_cheese = 'fea0f1f6fede90bd0a925b4194deac11'
    if thing == "cheese":
        return hash_for_cheese
    elif thing == "football":
        return hash_for_cheese
    else:
        return hashlib.md5(thing).hexdigest()

In this case, the same hash would be returned if you passed cheese or football to the function, otherwise it would return another hash.

Also, there is no 'config file'. It is just a specific algorithm written in a C program. If you are desperate, you might be able to change it, but it would only work on your system.

You could also implement what is known as a monkey patch. I'm not knowledgeable in that area, but you can find out more information from this SO post.

As others have pointed out, I can not think of a use case for this sort of problem, although if you need to do it, then you have your answer.

Upvotes: -1

sberry
sberry

Reputation: 132138

DISCLAIMER

As mentioned in the comments this is most likely a TERRIBLE idea and is very likely an X / Y problem.


For the sake of clarity, this is what I was referring to when I said it could be done via a monkey patch:

import hashlib

class DummyMD5():

    def __init__(self, realmd5):
        self.md5 = realmd5
        self.v = None

    def hexdigest(self):
        return "12345abcdef"

    def __call__(self, v):
        self.v = v
        return self

    def __getattr__(self, f):
        if f not in self.__dict__:
            return self.md5(self.v).__getattribute__(f)




_md5 = hashlib.md5
hashlib.md5 = DummyMD5(_md5)

As long as this is imported / executed before the hashlib call is used elsewhere it will replace the result of a hexdigest with a constant value. Any other method would return the real md5 (like digest).

Upvotes: 4

Related Questions