ZolaKt
ZolaKt

Reputation: 4721

Java text splitting algorithm

I have a large string (with text). I need to split it into a few pieces (according to max chat limit), run some operations with them independently, and in the end merge the result.

A pretty simple task. I'm just looking for an algorithm that will split text naturally. So it doesn't split it on fixed sized substrings, and doesn't cut the words in half.

For example (* is the 100th char, max char limit is set to 100):

....split me aro*und here...

the 1st fragment should contain: ...split me

the 2nd fragment should be: around here...

Working in Java btw.

Upvotes: 1

Views: 2244

Answers (4)

Richard Barnett
Richard Barnett

Reputation: 1108

Jakarta commons-lang WordUtils.wrap() is close:

  • It only breaks on spaces
  • It doesn't return a list, but you can choose a "line separator" that's unlikely to occur in the text & then split on that

Upvotes: 1

Peter Lawrey
Peter Lawrey

Reputation: 533442

You could use lastIndexOf(String find, int index).

public static List<String> splitByText(String text, String sep, int maxLength) {
    List<String> ret = new ArrayList<String>();
    int start = 0;
    while (start + maxLength < text.length()) {
        int index = text.lastIndexOf(sep, start + maxLength);
        if (index < start)
            throw new IllegalArgumentException("Unable to break into strings of " +
                    "no more than " + maxLength);
        ret.add(text.substring(start, index));
        start = index + sep.length();
    }
    ret.add(text.substring(start));
    return ret;
}

And

System.out.println(splitByText("....split me around here...", " ", 14));

Prints

[....split me, around here...]

Upvotes: 1

evilReiko
evilReiko

Reputation: 20473

If you're using Swing for your chat, then you can handle it like this:

//textarea is JTextArea instance
textarea.setLineWrap(true);
textarea.setWrapStyleWord(true);

Upvotes: 0

Jim
Jim

Reputation: 22646

The wikipedia article on word wrapping discusses this. It also links to an algorithm by Knuth.

Upvotes: 7

Related Questions