Reputation: 103
For a demo project, I want to create a hashing function with a very high probability of collision. Something simple is fine since the aim of the project is NOT security - but to demonstrate hash collisions.
Can anyone help me get started with an algorithm, or a sample implementation, or just point me in the right direction?
I am doing this in Python, though maybe that should not matter.
Upvotes: 1
Views: 315
Reputation: 2240
One algorithm that comes to mind is hashing using the first letter of the string.
Something like
hash[ord(text[0]) - ord('a')] = text
So anything starting with the same letter will be hashed together. As you can see, that's a lot of collisions.
Another idea is to hash according to the length of the string.
hash[len(text)] = text
You can use what hayden suggests in a comment above, and cause further collisions by taking the length modulo some number. Eg.
hash[len(text) % 5] = text
Upvotes: 0
Reputation: 184071
You could use the sum of the characters in a string. It's the first hash function I was taught back when I was first learning BASIC in high school, and I ran into the collision problem right away and had to figure out how to deal with it.
sum(ord(c) for c in text)
Transpositions are easily achieved by swapping strings or even words. For more fun you could also make it case-insensitive:
sum(ord(c) for c in text.lower())
I'll even give you a sample collision for that last one: Jerry Kindall -> Dillan Kyrjer :-)
Upvotes: 4