Reputation: 1232
Given a suffix array, a TopCoder task from SRM 630 asks to find the minium number of distinct characters in the string that could form a string with the given suffix array. The full problem statement can be found on the TopCoder website.
The best solution I found is right here: https://github.com/ftiasch/acm-icpc/blob/6db1ed02a727611830b974a1d4de38bab8f390f9/topcoder/single-round-match/single-round-match-630/SuffixArrayDiv1.java
Here is the algorithm written by ftiasch:
public int minimalCharacters(int[] array) {
int n = array.length;
int[] position = new int[n + 1];
for (int i = 0; i < n; ++i) {
position[array[i]] = i;
}
position[n] = -1;
int[] minimum = new int[n + 1];
for (int i = n - 1; i >= 0; --i) {
minimum[i] = Integer.MAX_VALUE;
for (int j = i + 1; j <= n; ++j) {
boolean valid = true;
for (int x = i; x < j; ++x) {
for (int y = x + 1; y < j; ++y) {
valid &= position[array[x] + 1] < position[array[y] + 1];
}
}
if (valid && minimum[j] < Integer.MAX_VALUE) {
minimum[i] = Math.min(minimum[i], minimum[j] + 1);
}
}
}
return minimum[0];
}
I understand that this is a dynamic programming algorithm but how does it work? I really need a hand understanding it.
Here is what ftiasch wrote me back:
hi Ariel,
First of all, thanks to your compliment. Frankly speaking, my solution is not the best solution to the problem. The optimal one runs in O(n) time but mine takes O(n^4). I just picked this idea during the contest because n is relatively small.
Keep in mind that same characters become continuous in the SA. Since the problem asked for the least number of characters, so I decided to use dynamic programming to partition the SA into consecutive segments so that each segments start with the same character.
Which condition is necessary for S[SA[i]] == S[SA[j]] assumed that i < j? The lexicographic comparison suggests that suffix(SA[i] + 1) should be smaller than suffix(SA[j] + 1). We can easily find that the condition is also sufficient.
Write to me if you have any other question. :)
We finally managed to make it work, thanks to David. Here is the linear time algorithm in java from David's Python version:
public int minimalCharacters(int[] array) {
int n = array.length, i;
if (n == 0)
return 0;
int[] array1 = new int[n + 1];
for (i = 0; i < n; i++)
array1[1 + i] = array[i];
int[] position = new int[n + 1];
for (i = 0; i < n + 1; i++)
position[array1[i]] = i;
int k = 1;
for (i = n; i > 1; i--) {
if (position[array1[i] + 1] <= position[array1[i - 1] + 1])
k++;
}
return k;
}
Upvotes: 3
Views: 1211
Reputation: 65498
Here’s a quadratic-time algorithm. The suffix array specifies for each
pair of suffixes how they compare lexicographically (and the empty
suffix always is less than all of them). Let s
be the unknown string
and suppose that we’re comparing suffix s[i...]
with suffix s[j...]
.
If s[i] != s[j]
, then the comparison of s[i]
and s[j]
settles it.
Otherwise, the result is the same as if we compare s[i+1...]
and
s[j+1...]
.
Suppose that we wish to ensure that s[i...] < s[j...]
. Clearly we need
s[i] <= s[j]
. In fact, unless s[i+1...] < s[j+1...]
, we need the
strict inequality s[i] < s[j]
, as otherwise the tiebreaker will go the
wrong way. Otherwise, s[i] == s[j]
will suffice regardless of the rest
of the string. Gather up all of the inequalities as arcs in a directed
graph with vertices corresponding to positions in s
. This graph is
necessarily acyclic by the total order on suffixes. Make each arc length
1 if the inequality is strict and length 0 otherwise. Output the length
of the longest path, plus one (or zero if the graph is empty).
At least this many distinct letters are needed, by the corresponding
chain of inequalities. What’s perhaps less clear is that this many
distinct letters suffices, but if we determine the label of each
vertex/position in s
by the length of the longest path starting there,
then the head and tail of each arc are labeled appropriately.
To get down to linear time, we can exploit the structure of the graph. It’s straightforward (though not trivial; the graph is metric after all) to show that the path visiting all vertices of the graph is the longest, so we merely have to compute its length.
Below are a transliterated version of the sample code (minChars1
), an
implementation straight from the description above (minChars2
, now
stripped of all comprehension usage), a brute force solution
(minChars3
), and the linear-time solution (minChars4
).
import itertools
def minChars1(array):
n = len(array)
position = [-1] * (n + 1)
for i in range(n):
position[array[i]] = i
infinity = n + 1
minimum = [infinity] * (n + 1)
minimum[n] = 0
for i in range(n - 1, -1, -1):
for j in range(i + 1, n + 1):
valid = True
for x in range(i, j):
for y in range(x + 1, j):
valid = valid and position[array[x] + 1] < position[array[y] + 1]
if valid and minimum[j] < infinity:
minimum[i] = min(minimum[i], minimum[j] + 1)
return minimum[0]
def lengthOfLongestPath(graph, memo, i):
if i not in memo:
result = 0
for w, j in graph[i]:
result = max(result, w + lengthOfLongestPath(graph, memo, j))
memo[i] = result
return memo[i]
def minChars2(array):
n = len(array)
position = [-1] * (n + 1)
for i in range(n):
position[array[i]] = i
graph = {}
for i in range(n):
graph[i] = []
for j in range(n):
if position[i] > position[j]:
w = 0 if position[i + 1] > position[j + 1] else 1
graph[i].append((w, j))
memo = {None: -1}
for i in range(n):
lengthOfLongestPath(graph, memo, i)
return max(memo.values()) + 1
def minChars3(array):
n = len(array)
position = [None] * n
for i in range(n):
position[array[i]] = i
for k in range(n):
for s in itertools.product(range(k), repeat=n):
valid = True
for i in range(n):
for j in range(n):
valid = valid and (s[i:] < s[j:]) == (position[i] < position[j])
if valid:
return k
return n
def minChars4(array):
n = len(array)
if n == 0:
return 0
array1 = [n] * (n + 1)
for i in range(n):
array1[1 + i] = array[i]
position = [None] * (n + 1)
for i in range(n + 1):
position[array1[i]] = i
k = 1
for i in range(n, 1, -1):
if position[array1[i] + 1] <= position[array1[i - 1] + 1]:
k += 1
return k
def test():
for n in range(7):
for array in itertools.permutations(range(n)):
assert minChars1(array) == minChars2(array) == minChars3(array) == minChars4(array)
test()
Upvotes: 4