epoch
epoch

Reputation: 16595

Split String at n-th character preserving words

Expanding on this answer, using this regex (?<=\\G.{" + count + "}); I would also like to modify the expression to not split words in the middle.

Example:

String string = "Hello I would like to split this string preserving these words";

if I want to split on 10 characters it would look like this:

[Hello I wo, uld like t, o split th, is string , preserving, these wor, ds]

Question:

Is this even possible using only regex, or would a lexer or some other string manipulation be needed?

UPDATE

This is what I want to use it on:

 + -------------------------------------------JVM Information------------------------------------------ + 
 | sun.boot.class.path : C:\Program Files\Java\jdk1.6.0_33\jre\lib\resources.jar;C:\Program Files\Java\ | 
 |                       jdk1.6.0_33\jre\lib\rt.jar;C:\Program Files\Java\jdk1.6.0_33\jre\lib\sunrsasig | 
 |                       n.jar;C:\Program Files\Java\jdk1.6.0_33\jre\lib\jsse.jar;C:\Program Files\Java | 
 |                       \jdk1.6.0_33\jre\lib\jce.jar;C:\Program Files\Java\jdk1.6.0_33\jre\lib\charset | 
 |                       s.jar;C:\Program Files\Java\jdk1.6.0_33\jre\lib\modules\jdk.boot.jar;C:\Progra | 
 |                       m Files\Java\jdk1.6.0_33\jre\classes                                           | 
 + ---------------------------------------------------------------------------------------------------- + 

The box surrounding it has the character limit minus the key width, however this does not look good. This example is also not the only use-case, i use that box for multiple types of information.

Upvotes: 2

Views: 2741

Answers (3)

Tk421
Tk421

Reputation: 6418

I have looked at this problem and none of those replies actually convinced me! Here is my version. It is very likely that it can be improved.

public static String[] splitPresenvingWords(String text, int length) {
    return text.replaceAll("(?:\\s*)(.{1,"+ length +"})(?:\\s+|\\s*$)", "$1\n").split("\n");
}

Upvotes: 4

Qtax
Qtax

Reputation: 33908

"not split words in the middle" does not define what should happen in case of "not splitting".

Given the split length being 10 and the string:

Hello I would like to split this string preserving these words

If you want to split right after a word, resulting in the list:

Hello I would, like to split, this string, preserving, these words

You can accomplish all kinds of tricky "splits" by using plain matching.

Simply match all occurences of this expression:

(?s)\G.{10,}?\b

(Using (?s) to turn on the DOTALL flag.)

In Perl it's as simple as @array = $str =~ /\G.{10,}?\b/gs, but Java seems to lack a quick function to return all matches, so you'd probably have to use a matcher and push the results on to an array/list.

Upvotes: 1

tobias_k
tobias_k

Reputation: 82889

No regex, but it seems to work:

List<String> parts = new ArrayList<String>();
while (true) {
    // look for space to the left of n-th character
    int index = string.lastIndexOf(" ", n);
    if (index == -1) {
        // no space to the left (very long word) -> next space to the right
        // change this to 'index = n' to break words in this case
        index = string.indexOf(" ", n);
    }
    if (index == -1) {
        break;
    }
    parts.add(string.substring(0,  index));
    string = string.substring(index+1);
}
parts.add(string);

This will first look if there is a space to the left of the n-th character. In this case, the string is split there. Otherwise, it looks for the next space to the right. Alternatively, you could break the word in this case.

Upvotes: 1

Related Questions