Avinash
Avinash

Reputation: 13267

Longest prefix string length for all the suffixes

Find the length of the longest prefix string for all the suffixes of the string.

For example suffixes of the string ababaa are ababaa, babaa, abaa, baa, aa and a. The similarities of each of these strings with the string "ababaa" are 6,0,3,0,1,1 respectively. Thus the answer is 6 + 0 + 3 + 0 + 1 + 1 = 11.

I wrote following code

#include <iostream>
#include <string.h>

#include <stdio.h>
#include <time.h>

int main ( int argc, char **argv) {
    size_t T;
    std::cin >> T;
    char input[100000];
    for ( register size_t i = 0; i < T; ++i) {
        std::cin >> input;

        double t = clock();

        size_t len    = strlen(input);
        char *left    = input;
        char *right   = input + len - 1;
        long long sol = 0;
        int end_count = 1;
        while ( left < right ) {
            if ( *right != '\0') {
                if ( *left++ == *right++ ) {
                    sol++;
                    continue;
                }
            }
            end_count++;
            left = input; // reset the left pointer
            right = input + len - end_count; // set right to one left.
        }
        std::cout << sol + len << std::endl;
        printf("time= %.3fs\n", (clock() - t) / (double)(CLOCKS_PER_SEC));
    }
}

Working fine, but for a string which is 100000 long and having same character i.e. aaaaaaaaaa.......a, it is taking long time , how can i optimize this one more.

Upvotes: 0

Views: 5142

Answers (6)

Eugene Baranchuk
Eugene Baranchuk

Reputation: 91

Use Z algorithm to calculate length of all substrings, which also prefixes in O(n) and then scan resulting array and sum its values.

Reference: https://www.geeksforgeeks.org/sum-of-similarities-of-string-with-all-of-its-suffixes/

Upvotes: 1

xyz
xyz

Reputation: 201

Here a java implementation:

        // sprefix
        String s = "abababa";
        Vector<Integer>[] v = new Vector[s.length()];
        int sPrefix = s.length();
        v[0] = new Vector<Integer>();
        v[0].add(new Integer(0));
        for(int j = 1; j < s.length(); j++)
        {
            v[j] = new Vector<Integer>();
            v[j].add(new Integer(0));
            for(int k = 0; k < v[j - 1].size(); k++)
                if(s.charAt(j) == s.charAt(v[j - 1].get(k)))
                {
                    v[j].add(v[j - 1].get(k) + 1);
                    v[j - 1].set(k, 0);
                }
        }

        for(int j = 0; j < v.length; j++)
            for(int k = 0; k < v[j].size(); k++)
                sPrefix += v[j].get(k);

        System.out.println("Result = " + sPrefix);

Upvotes: 0

wilx
wilx

Reputation: 18268

Let's say your ababaa is a pattern P. I think you could use the following algorithm:

  1. Create a suffix automata for all possible suffixes of P.
  2. Walk the automata using P as input, count edges traversed so far. For each accepting state of the automata add the current edge count to total sum. Walk the automata until you either reach the end of the input or there are no more edges to go through.
  3. The total sum is the result.

Upvotes: 1

tblum
tblum

Reputation: 153

You can use Suffix Array: http://en.wikipedia.org/wiki/Suffix_array

Upvotes: 3

duedl0r
duedl0r

Reputation: 9424

I'm not sure whether a Trie gives you much performance gain.. but I would certainly think about it.

The other idea I had is to try to compress your string. I didn't really think about it, just a crazy idea...

if you have a string like this: ababaa compress it maybe to: abab2a. Then you have to come up with a technique where you can use your algorithm with those strings. The advantage is you can then compare long strings 100000a efficiently with each other. Or more importantly: you can calculate your sum very fast.

But again, I didn't think it through, maybe this is a very bad idea ;)

Upvotes: 0

Shamim Hafiz - MSFT
Shamim Hafiz - MSFT

Reputation: 22124

From what I see, you are using plain array to evaluate the suffix and though it may turn out to be efficient for some data set, it would fail to be efficient for some cases, such as the one you mentioned.

You would need to implement a Prefix-Tree or Trie like Data Structure. The code for those aren't straightforward, so if you are not familiar with them, I would suggest you read a little bit about them.

Upvotes: 0

Related Questions