Rich
Rich

Reputation: 12663

Suffix Array Implementation in Java

I'm looking to write an efficient n-order Markov chain method to generate random text strings given a set of example text. I currently have a Java implementation that uses several layers of Maps, but it's clunky. A suffix array would be perfect for my needs, but I'm not clear if that can be efficiently implemented in Java.

In C I might do something like:

char exampleText[MAX];
char *suffixArray[MAX];
...
while(n<MAX && suffixArray[n++] = &exampleText[n]);
sort(suffixArray);

This gets gnarly in Java since I'd have to take substrings of exampleText, or turn suffixArray into an array of indices, or something else.

Any suggestions for a good approach to this in Java?

Upvotes: 1

Views: 6092

Answers (3)

user2271868
user2271868

Reputation:

You can make some variants form array of suffixes:

First:

public static String[] suffixes(String s)
{
int N = s.length();
String[] suffixes = new String[N];
for (int i = 0; i < N; i++)
suffixes[i] = s.substring(i, N);
return suffixes;
}

Second:

public static String[] suffixes(String s)
{
int N = s.length();
StringBuilder sb = new StringBuilder(s);
String[] suffixes = new String[N];
for (int i = 0; i < N; i++)
suffixes[i] = sb.substring(i, N);
return suffixes;
}

Upvotes: 1

jogojapan
jogojapan

Reputation: 70017

For anyone interested in more efficient ways of constructing the suffix array in Java, I once used a library called jsuffixarrays. The code is here. It offers a range of construction algorithms to choose from and I found it to work very well. E.g. to use the SKEW algorithm you do this:

import org.jsuffixarrays.Algorithm;
import org.jsuffixarrays.ISuffixArrayBuilder;
import org.jsuffixarrays.SuffixArrays;
import org.jsuffixarrays.SuffixData;

String              exampleText = "....";
ISuffixArrayBuilder suxBuilder  = Algorithm.SKEW.getDecoratedInstance();
SuffixData          sux         = SuffixArrays.createWithLCP(text,sux_builder);

/* Then, to access the suffix array: */
sux.getSuffixArray();
/* And, to access the LCP array: */
sux.getLCP();

You can build without the LCP array if don't need that.

Upvotes: 2

Tom Hawtin - tackline
Tom Hawtin - tackline

Reputation: 147154

String will [typically] do that for you. (Typical implementations share backing arrays when created with substring, although that is subject to change at any time.)

Upvotes: 2

Related Questions