Reputation: 1
I am creating a rainbow table with strings and hashes separated by spaces in a table. The rainbow table looks like this:
j)O 3be44b195706cdd25e29d2b01a0e88d4
j)P a83079350701398672677a9ffe07108c
j)Q 2952c4654c127f2bb1086b75d8f1f986
j)R 6621ec6e1ba3c3669259894db8cde339
j)S 0442a2ee045e1913cd2eb094e8945399
I want to know how I can make a python program to search for a string and find a hash or vice versa.
I have made it search the whole document, but I want it to only search a specific column.
I used panda and I can make it search now in a specific column but I want it only to find exact matchs:
working_table = pd.read_csv('rainbow_table_md5.txt', sep = ' ', names=["string", "hash"])
print(working_table['hash'].where(working_table['string'] == input(colored("String: ", 'cyan'))))
The code right now outputs this:
String: a
0 0cc175b9c0f1b6a831c399e269772661
1 NaN
2 NaN
...
14094701 NaN
14094702 NaN
Name: hash, Length: 14094731, dtype: object
I don't need all the other lines other than the match in row 0
Ideally I only need the hash as the output.
Upvotes: 0
Views: 1149
Reputation: 20415
You want "lookup" rather than "search", since only an exact match matters. Pandas might be overkill for this application. A pair of dictionaries suffices:
class Rainbow:
def __init__(self, infile, k=20):
self.s_to_hash = {s: hash
for s, hash in self._read_tuples(infile)}
self.hash_to_s = {hash[:k]: s
for s, hash in self.s_to_hash.items()}
self.k = k
@staticmethod
def _read_tuples(infile):
with open(infile) as fin:
for line in fin:
s, hash = line.strip().split()
yield s, hash
Choosing k < 32
is an attempt to save some memory, at the (small) risk of having hashes collide based on their common prefix.
Tune it up or down to taste, based on your memory, table size, and appetite for collision risk.
Consider writing a getter function and then making hash_to_s
private.
Storing bytes would be twice as memory efficient compared to storing ascii hex nybbles.
Upvotes: 0