FRANKfisher
FRANKfisher

Reputation: 121

SHA256 Find Partial Collision

I have two message:

messageA: "Frank is one of the "best" students topicId{} "

messageB: "Frank is one of the "top" students topicId{} "

I need to find SHA256 partially collision of these two messages(8 digits). Therefore, The first 8 digests of SHA256(messageA) == The first 8 digest of SHA256(messageB)

We can put any letters and numbers in {}, Both {} should have same string

I have tried brute force and birthday attack with hash table to solve this problem, but it costs too much time. I know the cycle detection algorithm like Floyd and Brent, however i have no idea how to construct the cycle for this problem. Are there any other methods to solve this problem? Thank you so much!

Upvotes: 1

Views: 5460

Answers (2)

MurgleDreeBrah
MurgleDreeBrah

Reputation: 841

I'm assuming the space at the end of the strings in the question was intentional so I left it in.

"Frank is one of the "top" students topicId{59220691223} " 6026d9b323898bcd7ecdbcbcd575b0a1d9dc22fd9e60074aefcbaade494a50ae

"Frank is one of the "best" students topicId{59220691223} " 6026d9b31ba780bb9973e7cfc8c9f74a35b54448d441a61cc9bf8db0fcae5280

It actually took about 7 billion tries to find one using brute force, a lot more than I expected.

I figure 2^32 is roughly 4.3 billion and so chance of not finding any match after 4.3 billion tries is about 36.78%

(1 - 2^-32)^32 = 36.78%

I actually found a match after about 7 billion tries, there was less than a 20% chance of no matches in 7 billion tries.

(1 - 2^-32)^(7 * 10^10) = 19.59%

This is the C++ code I used running on 7 threads, each thread gets a different starting point and it quits once a match is found on any thread. Each thread also updates its progress to cout every 1 million attempts.

I've fast forwarded to where the match was found on threadId=5, so it takes less than a minute to run. But if you change the starting point you can look for other matches.

And I'm not sure either how one would use Floyd and Brent since the strings have to use the same topicId so you are locked in on both the prefix and suffix.

/*
To compile go get picosha2 header file from https://github.com/okdshin/PicoSHA2 
Copy this code into same directory as picosha2.h file, save it as hash.cpp for example.
On Linux go to command line and cd to directory where these files are. 

To compile it:
g++ -O2 -o hash hash.cpp -l pthread

And run it:
./hash

*/

#include <iostream>
#include <string>
#include <thread>
#include <mutex>

// I used picoSHA2 header only file for the hashing
// https://github.com/okdshin/PicoSHA2
#include "picosha2.h"


// return 1st 4 bytes (8 chars) of SHA256 hash
std::string hash8(const std::string& src_str) {
    std::vector<unsigned char> hash(picosha2::k_digest_size);
    picosha2::hash256(src_str.begin(), src_str.end(), hash.begin(), hash.end());
    return picosha2::bytes_to_hex_string(hash.begin(), hash.begin() + 4);
}

bool done = false;
std::mutex mtxCout;

void work(unsigned long long threadId) {
    std::string a = "Frank is one of the \"best\" students topicId{",
        b = "Frank is one of the \"top\" students topicId{";
        
    // Each thread gets a different starting point, I've fast forwarded to the part 
    // where I found the match so this won't take long to run if you try it, < 1 minute.
    // If you want to run a while drop the last "+ 150000000ULL" term and it will run 
    // for about 1 billion total (150 million each thread, assuming 7 threads) take 
    // about 30 minutes on Linux.
    // Collision occurred on threadId = 5, so if you change it to use less than 6 threads  
    // then your mileage may vary.
    
    unsigned long long start = threadId * (11666666667ULL + 147000000ULL) + 150000000ULL;
    unsigned long long x = start;
    
    for (;;) {
        // Not concerned with making the reading/updating "done" flag atomic, unlikely
        // 2 collisions are found at once on separate threads, and writing to cout 
        // is guarded anyway.
        
        if (done) return;
        std::string xs = std::to_string(x++);
        std::string hashA = hash8(a + xs + "} "), hashB = hash8(b + xs + "} ");
        
        if (hashA == hashB) {
            std::lock_guard<std::mutex> lock(mtxCout);
            std::cout << "*** SOLVED ***" << std::endl;
            std::cout << (x-1) << std::endl;
            std::cout << "\"" << a << (x - 1) << "} \" = " << hashA << std::endl;
            std::cout << "\"" << b << (x - 1) << "} \"  = " << hashB << std::endl;
            done = true;
            return;
        }
        
        if (((x - start) % 1000000ULL) == 0) {
            std::lock_guard<std::mutex> lock(mtxCout);
            std::cout << "thread: " << threadId << " = " << (x-start) 
                << " tries so far" << std::endl;
        }
    }
}

void runBruteForce() {
    const int NUM_THREADS = 7;
    std::thread threads[NUM_THREADS];
    for (int i = 0; i < NUM_THREADS; i++) threads[i] = std::thread(work, i);
    for (int i = 0; i < NUM_THREADS; i++) threads[i].join();
}

int main(int argc, char** argv) {
    runBruteForce();
    return 0;
}

Upvotes: 2

r3mainer
r3mainer

Reputation: 24547

This is pretty trivial to solve with a birthday attack. Here's how I did it in Python (v2):

def find_collision(ntries):
    from hashlib import sha256
    str1 = 'Frank is one of the "best" students topicId{%d} '
    str2 = 'Frank is one of the "top" students topicId{%d} '
    seen = {}
    for n in xrange(ntries):
        h = sha256(str1 % n).digest()[:4].encode('hex')
        seen[h] = n
    for n in xrange(ntries):
        h = sha256(str2 % n).digest()[:4].encode('hex')
        if h in seen:
            print str1 % seen[h]
            print str2 % n

find_collision(100000)

If your attempt took too long to find a solution, then either you simply made a mistake in your coding somewhere, or you were using the wrong data type.

Python's dictionary data type is implemented using hash tables. That means you can search for dictionary elements in constant time. If you implemented seen using a list instead of a dict in the above code, then the search at line 11 would take an awful lot longer.


Edit:

If the two topicId tokens have to be identical, then — as pointed out in the comments — there is little option but to grind through somewhere in the order of 231 values. You will find a collision eventually, but it could take a long time.

Just leave this running overnight and with a bit of luck you'll have an answer in the morning:

def find_collision():
    from hashlib import sha256
    str1 = 'Frank is one of the "best" students topicId{%x} '
    str2 = 'Frank is one of the "top" students topicId{%x} '
    seen = {}
    n = 0
    while True:
        if sha256(str1 % n).digest()[:4] == sha256(str2 % n).digest()[:4]:
            print str1 % n
            print str2 % n
            break
        n += 1

find_collision()

If you're in a hurry, you could maybe look into using a GPU to speed up the hash calculations.

Upvotes: 3

Related Questions