Nikhil Bhawsar
Nikhil Bhawsar

Reputation: 61

Implementation of z algorithm

Since 4 days I was reading about strings and some algo for pattern matching and for that I got KMP searching algo and it was good, but I also got that there is another method for string matching which is the same as KMP in space and time complexity, but has an easy solution.

The algorithm was Z-algorithm.

So for that I searched google but I did not find a good explanation for the algo. Can you please explain how to create pattern array and how to apply search procedure? That would be good if you will provide code in c++.

Upvotes: 3

Views: 3764

Answers (2)

Murali Manohar
Murali Manohar

Reputation: 339

After long time I understood how to build Z array. I will explain here in easy words.

Lets first understand what is prefix:

Example :

  1. In word apple the prefix can be apple (or) appl (or) app (or) ap (or) a.

  2. In word banana the prefix can be banana (or) banan (or) bana (or) ban (or) ba (or) b.

Explanation: Any substring S of a string T that matches from starting of the string T till end of string T or before the end is called as prefix.

Hope You understood what is prefix here.

Lets see how to build Z array.

Lets take this example text : a a b $ b a a b a a

Index : 0 1 2 3 4 5 6 7 8 9

Text : a a b $ b a a b a a

Z value : x 1 0 0 0 3 1 0 2 1

Note:

a. Substring should start from ith position.

b. substring should be of maximum length which is also a prefix

  1. At index 0:

Finding substring from i(0th index) till end which is also prefix of given text.

a a b $ b a a b a a => of length 10 is the longest substring which is also a prefix of the text. But this will not help in pattern matching so we make it as x in Z array.

  1. At index 1:

Finding longest substring starting from position 1 till end which is also a prefix of the text.

such substring are:

a. "a" => prefix of the text "a a b $ b a a b a a" and length is 1.

b. "a b" => Not a prefix

c. "a b $" => Not a prefix

d. "a b $ b" => Not a prefix

e. "a b $ b a" => Not a prefix

and so...

Here the only longest substring which is also a prefix is "a" and its length is 1. It is stored in Z array.

  1. At index 2:

Substrings are:

a. "b" => Not a prefix

b. "b $" => Not a prefix

c. "b $ b" => Not a prefix

and so...

Here there is no substring which is also a prefix of text T. So we are storing zero at index 2 in Z array.

  1. At index 5:

substrings are:

a. "a" => prefix of text "a a b $ b a a b a a" and length is 1.

b. "a a" => prefix of text "a a b $ b a a b a a" and length is 2.

c. "a a b" => prefix of text "a a b $ b a a b a a" and length is 3.

d. "a a b a"=> Not a prefix

and so...

Here the longest substring which is also a prefix of text T is "a a b" of length 3. So we store 3 at index 5 in Z array.

Finally, If any value in Z array is same as the length of the pattern then that pattern is present in the text T.

Upvotes: 8

RATHI
RATHI

Reputation: 5289

In Z-algo, we construct a Z array.

What is Z Array? For a string str[0..n-1], Z array is of same length as string. An element Z[i] of Z array stores length of the longest substring starting from str[i] which is also a prefix of str[0..n-1]. The first entry of Z array is meaning less as complete string is always prefix of itself.

> Example: Index            0   1   2   3   4   5   6   7   8   9  10  11
> Text                      a   a   b   c   a   a   b   x   a   a   a   z   
> values         X          1   0   0   3   1   0   0   2   2   1   0  More
> Examples: str  = "aaaaaa" Z[]  = {x, 5, 4, 3, 2, 1}
> 
> str = "aabaacd" Z[] = {x, 1, 0, 2, 1, 0, 0}
> 
> str = "abababab" Z[] = {x, 0, 6, 0, 4, 0, 2, 0}

The idea is to concatenate pattern and text, and create a string “P$T” where P is pattern, $ is a special character should not be present in pattern and text, and T is text. Build the Z array for concatenated string. In Z array, if Z value at any point is equal to pattern length, then pattern is present at that point.

Example:
Pattern P = "aab",  Text T = "baabaa"

The concatenated string is = "aab$baabaa"

Z array for above concatenated string is {x, 1, 0, 0, 0, 
                                          3, 1, 0, 2, 1}.
Since length of pattern is 3, the value 3 in Z array 
indicates presence of pattern. 

Detailed explanation and Implementation you can find here

Upvotes: 1

Related Questions