Reputation: 29
Given a string A and a set of string S. Need to find an optimum method to find a prefix of A which is not a prefix of any of the strings in S.
Example
A={apple}
S={april,apprehend,apprehension}
Output should be "appl" and not "app" since "app" is prefix of both "apple" and "apprehension" but "appl" is not.
I know the trie approach; by making a trie of set S and then traversing in the trie for string A.
But what I want to ask is can we do it without trie?
Like can we compare every pair (A,Si), Si = ith string from set S and get the largest common prefix out of them.In this case that would be "app" , so now the required ans would be "appl".
This would take 2 loops(one for iterating through S and another for comparing Si and A).
Can we improve upon this??
Please suggest an optimum approach.
Upvotes: 1
Views: 222
Reputation: 55609
I'm not sure exactly what you had in mind, but here's one way to do it:
Keep a variable longest
, initialised to 0
.
Loop over all elements S[i]
of S,
setting longest = max(longest, matchingPrefixLength(S[i], A))
.
Return the prefix from A
of length longest+1
.
This uses O(1)
space and takes O(length(S)*average length of S[i])
time.
This is optimal (at least for the worst case) since you can't get around needing to look at every character of every element in S
.
Example:
A={apple}
S={april,apprehend,apprehension}
longest = 0
The longest prefix for S[0] and A is 2
So longest = max(0,2) = 2
The longest prefix for S[1] and A is 3
So longest = max(2,3) = 3
The longest prefix for S[2] and A is 3
So longest = max(3,3) = 3
Now we return the prefix of length longest+1 = 4, i.e. "appl"
Note that there are actually 2 trie-based approaches:
Store only A
in the trie. Iterate through the trie for each element from S
to eliminate prefixes.
This uses much less memory than the second approach (but still more than the approach above). At least assuming A
isn't much, much longer than S[i]
, but you can optimise to stop at the longest element in S
or construct the tree as we go to avoid this case.
Store all elements from S
in the trie. Iterate through the trie with A
to find the shortest non-matching prefix.
This approach is significantly faster if you have lots of A
's that you want to query for a constant set S
(since you only have to set up the trie once, and do a single lookup for each A
, where-as you have to create a new trie and run through each S[i]
for each A
for the first approach).
Upvotes: 2
Reputation: 28302
What is your input size?
Let's model your input as being of N+1 strings whose lengths are about M characters. Your total input size is about M(N+1) character, plus some proportional amount of apparatus to encode that data in a usable format (data structure overhead).
Your algorithm ...
maxlen = 0
for i = 1 to N
for j = 1 to M
if A[j] = S[i][j] then
if j > maxlen then maxlen = j
break
print A[1...maxlen]
... performs up M x N iterations of the innermost loop, reading two characters each time, for a total of 2MN characters read.
Recall our input data size was about M(N+1) also. So our question now is whether we can solve this problem, in the worst case, looking at asymptotically less than the total input (you do a little less than looking at all the input twice, or linear in the input size). The answer is no. Consider this worst case:
Any algorithm must look at M'-1 characters of N-1 strings, plus M' characters of 1 string, to correctly determine the answer of this problem instance is A.
(M'-1)(N'-1) + N = M'N - M' - N + 1 + N = M'N - M' + 1
For N >= 2, the dominant terms in both M'(N+1) and M'N' are both M'N, meaning that for N >= 2, both the input size and the amount of that input any correct algorithm must read is O(MN). Your algorithm is O(MN). Any other algorithm cannot be asymptotically better.
Upvotes: 0