Reputation: 16595
Expanding on this answer, using this regex (?<=\\G.{" + count + "})
; I would also like to modify the expression to not split words in the middle.
Example:
String string = "Hello I would like to split this string preserving these words";
if I want to split on 10 characters it would look like this:
[Hello I wo, uld like t, o split th, is string , preserving, these wor, ds]
Question:
Is this even possible using only regex
, or would a lexer or some other string manipulation be needed?
UPDATE
This is what I want to use it on:
+ -------------------------------------------JVM Information------------------------------------------ + | sun.boot.class.path : C:\Program Files\Java\jdk1.6.0_33\jre\lib\resources.jar;C:\Program Files\Java\ | | jdk1.6.0_33\jre\lib\rt.jar;C:\Program Files\Java\jdk1.6.0_33\jre\lib\sunrsasig | | n.jar;C:\Program Files\Java\jdk1.6.0_33\jre\lib\jsse.jar;C:\Program Files\Java | | \jdk1.6.0_33\jre\lib\jce.jar;C:\Program Files\Java\jdk1.6.0_33\jre\lib\charset | | s.jar;C:\Program Files\Java\jdk1.6.0_33\jre\lib\modules\jdk.boot.jar;C:\Progra | | m Files\Java\jdk1.6.0_33\jre\classes | + ---------------------------------------------------------------------------------------------------- +
The box surrounding it has the character limit minus the key width, however this does not look good. This example is also not the only use-case, i use that box for multiple types of information.
Upvotes: 2
Views: 2741
Reputation: 6418
I have looked at this problem and none of those replies actually convinced me! Here is my version. It is very likely that it can be improved.
public static String[] splitPresenvingWords(String text, int length) {
return text.replaceAll("(?:\\s*)(.{1,"+ length +"})(?:\\s+|\\s*$)", "$1\n").split("\n");
}
Upvotes: 4
Reputation: 33908
"not split words in the middle" does not define what should happen in case of "not splitting".
Given the split length being 10 and the string:
Hello I would like to split this string preserving these words
If you want to split right after a word, resulting in the list:
Hello I would, like to split, this string, preserving, these words
You can accomplish all kinds of tricky "splits" by using plain matching.
Simply match all occurences of this expression:
(?s)\G.{10,}?\b
(Using (?s)
to turn on the DOTALL
flag.)
In Perl it's as simple as @array = $str =~ /\G.{10,}?\b/gs
, but Java seems to lack a quick function to return all matches, so you'd probably have to use a matcher and push the results on to an array/list.
Upvotes: 1
Reputation: 82889
No regex, but it seems to work:
List<String> parts = new ArrayList<String>();
while (true) {
// look for space to the left of n-th character
int index = string.lastIndexOf(" ", n);
if (index == -1) {
// no space to the left (very long word) -> next space to the right
// change this to 'index = n' to break words in this case
index = string.indexOf(" ", n);
}
if (index == -1) {
break;
}
parts.add(string.substring(0, index));
string = string.substring(index+1);
}
parts.add(string);
This will first look if there is a space to the left of the n-th character. In this case, the string is split there. Otherwise, it looks for the next space to the right. Alternatively, you could break the word in this case.
Upvotes: 1