Deploymental
Deploymental

Reputation: 31

Finding the strings in a TreeSet that start with a given prefix

I'm trying to find the strings in a TreeSet<String> that start with a given prefix. I found a previous question asking for the same thing — Searching for a record in a TreeSet on the fly — but the answer given there doesn't work for me, because it assumes that the strings don't include Character.MAX_VALUE, and mine can.

(The answer there is to use treeSet.subSet(prefix, prefix + Character.MAX_VALUE), which gives all strings between prefix (inclusive) and prefix + Character.MAX_VALUE (exclusive), which comes out to all strings that start with prefix except those that start with prefix + Character.MAX_VALUE. But in my case I need to find all strings that start with prefix, including those that start with prefix + Character.MAX_VALUE.)

How can I do this?

Upvotes: 1

Views: 2213

Answers (2)

Alexey
Alexey

Reputation: 46

If anybody is looking for a shorter version of ruakh's answer:

First element is actually set.ceiling(prefix),and last - you have to increment the prefix and use set.floor(next_prefix)

public NavigableSet<String> subSetWithPrefix(NavigableSet<String> set, String prefix) {
    String first = set.ceiling(prefix);
    char[] chars = prefix.toCharArray();
    if(chars.length>0)
        chars[chars.length-1] = (char) (chars[chars.length-1]+1);
    String last = set.floor(new String(chars));
    if(first==null || last==null || last.compareTo(first)<0)
        return new TreeSet<>();
    return set.subSet(first, true, last, true);
}

Upvotes: 0

ruakh
ruakh

Reputation: 183446

To start with, I suggest re-examining your requirements. Character.MAX_VALUE is U+FFFF, which is not a valid Unicode character and never will be; so I can't think of a good reason why you would need to support it.

But if there's a good reason for that requirement, then — you need to "increment" your prefix to compute the least string that's greater than all strings starting with your prefix. For example, given "city", you need "citz". You can do that as follows:

/**
 * @param prefix
 * @return The least string that's greater than all strings starting with
 *         prefix, if one exists. Otherwise, returns Optional.empty().
 *         (Specifically, returns Optional.empty() if the prefix is the
 *         empty string, or is just a sequence of Character.MAX_VALUE-s.)
 */
private static Optional<String> incrementPrefix(final String prefix) {
    final StringBuilder sb = new StringBuilder(prefix);

    // remove any trailing occurrences of Character.MAX_VALUE:
    while (sb.length() > 0 && sb.charAt(sb.length() - 1) == Character.MAX_VALUE) {
        sb.setLength(sb.length() - 1);
    }

    // if the prefix is empty, then there's no upper bound:
    if (sb.length() == 0) {
        return Optional.empty();
    }

    // otherwise, increment the last character and return the result:
    sb.setCharAt(sb.length() - 1, (char) (sb.charAt(sb.length() - 1) + 1));
    return Optional.of(sb.toString());
}

To use it, you need to use subSet when the above method returns a string, and tailSet when it returns nothing:

/**
 * @param allElements - a SortedSet of strings. This set must use the
 *                      natural string ordering; otherwise this method
 *                      may not behave as intended.
 * @param prefix
 * @return The subset of allElements containing the strings that start
 *         with prefix.
 */
private static SortedSet<String> getElementsWithPrefix(
        final SortedSet<String> allElements, final String prefix) {

    final Optional<String> endpoint = incrementPrefix(prefix);

    if (endpoint.isPresent()) {
        return allElements.subSet(prefix, endpoint.get());
    } else {
        return allElements.tailSet(prefix);
    }
}

See it in action at: http://ideone.com/YvO4b3.

Upvotes: 1

Related Questions