discobot
discobot

Reputation: 31

Find longest suffix of string in given array

Given a string and array of strings find the longest suffix of string in array.

for example

string = google.com.tr

array = tr, nic.tr, gov.nic.tr, org.tr, com.tr

returns com.tr

I have tried to use binary search with specific comparator, but failed.

C-code would be welcome.

Edit:

I should have said that im looking for a solution where i can do as much work as i can in preparation step (when i only have a array of suffixes, and i can sort it in every way possible, build any data-structure around it etc..), and than for given string find its suffix in this array as fast as possible. Also i know that i can build a trie out of this array, and probably this will give me best performance possible, BUT im very lazy and keeping a trie in raw C in huge peace of tangled enterprise code is no fun at all. So some binsearch-like approach will be very welcome.

Upvotes: 2

Views: 1197

Answers (5)

Ninja420
Ninja420

Reputation: 3872

Why don't you use suffix arrays ? It works when you have large number of suffixes.

Complexity, O(n(logn)^2), there are O(nlogn) versions too.

Implementation in c here. You can also try googling suffix arrays.

Upvotes: 0

Jesus is Lord
Jesus is Lord

Reputation: 15399

Assuming constant time addressing of characters within strings this problem is isomorphic to finding the largest prefix.

  1. Let i = 0.

  2. Let S = null

  3. Let c = prefix[i]

  4. Remove strings a from A if a[i] != c and if A. Replace S with a if a.Length == i + 1.

  5. Increment i.

  6. Go to step 3.

Is that what you're looking for?


Example:

prefix = rt.moc.elgoog

array = rt.moc, rt.org, rt.cin.vof, rt.cin, rt

Pass 0: prefix[0] is 'r' and array[j][0] == 'r' for all j so nothing is removed from the array. i + 1 -> 0 + 1 -> 1 is our target length, but none of the strings have a length of 1, so S remains null.

Pass 1: prefix[1] is 't' and array[j][1] == 'r' for all j so nothing is removed from the array. However there is a string that has length 2, so S becomes rt.

Pass 2: prefix[2] is '.' and array[j][2] == '.' for the remaining strings so nothing changes.

Pass 3: prefix[3] is 'm' and array[j][3] != 'm' for rt.org, rt.cin.vof, and rt.cin so those strings are removed.

etc.

Upvotes: 1

haneefmubarak
haneefmubarak

Reputation: 2051

If your array of strings is something along the following:

char string[STRINGS][MAX_STRING_LENGTH];
string[0]="google.com.tr";
string[1]="nic.tr";

etc, then you can simply do this:

int x, max = 0;

for (x = 0; x < STRINGS; x++) {
    if (strlen(string[x]) > max) {
        max = strlen(string[x]);
    }
}

x = 0;

while(true) {
    if (string[max][x] == ".") {
       GOTO out;
    }
    x++;
}

out:

char output[MAX_STRING_LENGTH];
int y = 0;

while (string[max][x] != NULL) {
    output[y++] = string[++x];
}

(The above code may not actually work (errors, etc.), but you should get the general idea.

Upvotes: 0

tacos_tacos_tacos
tacos_tacos_tacos

Reputation: 10585

Naive, pseudo-answer:

  1. Sort array of suffixes by length (yes, there may be strings of same length, which is a problem with the question you are asking I think)
  2. Iterate over array and see if suffix is in given string
  3. If it is, exit the loop because you are done! If not, continue.

Alternatively, you could skip the sorting and just iterate, assigning the biggestString if the currentString is bigger than the biggestString that has matched.

Edit 0:

Maybe you could improve this by looking at your array before hand and considering "minimal" elements that need to be checked.

For instance, if .com appears in 20 members you could just check .com against the given string to potentially eliminate 20 candidates.

Edit 1:

On second thought, in order to compare elements in the array you will need to use a string comparison. My feeling is that any gain you get out of an attempt at optimizing the list of strings for comparison might be negated by the expense of comparing them before doing so, if that makes sense. Would appreciate if a CS type could correct me here...

Upvotes: 0

verbose
verbose

Reputation: 7917

Another naïve, pseudo-answer.

Set boolean "found" to false. While "found" is false, iterate over the array comparing the source string to the strings in the array. If there's a match, set "found" to true and break. If there's no match, use something like strchr() to get to the segment of the string following the first period. Iterate over the array again. Continue until there's a match, or until the last segment of the source string has been compared to all the strings in the array and failed to match.

Not very efficient....

Upvotes: 0

Related Questions