o.o
o.o

Reputation: 3751

Randomized algorithm for string matching

Question:

Given a text t[1...n, 1...n] and p[1...m, 1...m], n = 2m, from alphabet [0, Sigma-1], we say p matches t at [i,j] if t[i+k-1, j+L-1] = p[k,L] for all k,L. Design a randomized algorithm to find all matches in O(n^2) time with high probability.

Image:

enter image description here

Can someone help me understand what this text means? I believe it is saying that 't' has two words in it and the pattern is also two words but the length of both patterns is half of 't'. However, from here I don't understand how the range of [i,j] comes into play. That if statement goes over my head.

This could also be saying that t and p are 2D arrays and you are trying to match a "box" from the pattern in the t 2D array.

Any help would be appreciated, thank you!

Upvotes: 0

Views: 916

Answers (1)

uSeemSurprised
uSeemSurprised

Reputation: 1834

The problem asks you to find a 2D pattern i.e defined by the p array in the t array which is also 2D.

The most obvious randomized solution to this problem would be to generate two random indexes i and j and then start searching for the pattern from that (i, j).

To avoid doing redundant searches you can keep track of which pairs of (i, j) you have visited before, this can be done using a simple look up 2D array.

The complexity of above would be O(n^3) in the worst case.


You can also use hashing for comparing the strings to reduce the complexity to O(n^2).

You first need to hash the t array row by row and store the value in an array like hastT, you can use the Rolling hash algorithm for that.

You can then hash the p array using Rolling hash algorithm and store the hashes row by row in the array hashP.

Then when you generate the random pair (i, j), you can get the hash of the corresponding t array using the array hashT in linear time instead of the brute force comparision that takes quadratic time and compare (Note there can be collisions in the hash you can brute force when a hash matches to be completely sure).

To find the corresponding hash using the hashT we can do the following, suppose the current pair (i, j) is (3, 4), and the dimensions of the p array are 2 x 3.

Then we can compare hashT[3][7] - hash[3][3] == hashP[3] to find the result, the above logic comes from the rolling hash algo.

Pseudocode for search in linear time using hashing :

hashT[][], hashS[]

i = rand(), j = rand();

for(int k = i;k < i + lengthOfColumn(p);i++){
    if((hashT[i][j + lengthOfRow(p)] - hashT[i][j-1]) != hashP[i]){
        //patter does not match.
        return false;
    }
}

Upvotes: 2

Related Questions