portfoliobuilder
portfoliobuilder

Reputation: 7856

Breaking up paragraphs into String tokens

I am able to break up paragraphs of text into substrings based upon nth given character limit. The conflict I have is that my algorithm is doing exactly this, and is breaking up words. This is where I am stuck. If the character limit occurs in the middle of a word, how can I back track to a space so that all my substrings have entire words?

This is the algorithm I am using

int arrayLength = 0;
arrayLength = (int) Math.ceil(((mText.length() / (double) charLimit)));

String[] result = new String[arrayLength];
int j = 0;
int lastIndex = result.length - 1;
for (int i = 0; i < lastIndex; i++) {
    result[i] = mText.substring(j, j + charLimit);
    j += charLimit;
}

result[lastIndex] = mText.substring(j);

I am setting the charLimit variable with any nth, integer value. And mText is string with a paragraph of text. Any suggestions on how I can improve this? Thank you in advance.

I am receiving good responses, just so you know what I did to figure out of I landed on a space or not, I used this while loop. I just do not know how to correct from this point.

while (!strTemp.substring(strTemp.length() - 1).equalsIgnoreCase(" ")) {
    // somehow refine string before added to array
}

Upvotes: 2

Views: 168

Answers (1)

Trudbert
Trudbert

Reputation: 3188

Not sure if I understood correctly what you wanted but an answer to my interpretation:

You could find the last space before your character limit with lastIndexOf and then check if you are close enough to your limit (for text without whitespace) i.e.:

int arrayLength = 0;
arrayLength = (int) Math.ceil(((mText.length() / (double) charLimit)));

String[] result = new String[arrayLength];
int j = 0;
int tolerance = 10;
int splitpoint;
int lastIndex = result.length - 1;
for (int i = 0; i < lastIndex; i++) {
    splitpoint = mText.lastIndexOf(' ' ,j+charLimit);
    splitpoint = splitpoint > j+charLimit-tolerance ? splitpoint:j+charLimit;
    result[i] = mText.substring(j, splitpoint).trim();
    j = splitpoint;
}

result[lastIndex] = mText.substring(j).trim();

this will search for the last space before charLimit (example value) and either split the string there if it is less then tolerance away or split at charLimit if it isn't.

Only problem with this solution is that the last Stringtoken can be longer than charLimit so you might need to adjust arrayLength and loop while (mText - j > charLimit)


Edit

running sample code:

 public static void main(String[] args) {
    String mText =  "I am able to break up paragraphs of text into substrings based upon nth given character limit. The conflict I have is that my algorithm is doing exactly this, and is breaking up words. This is where I am stuck. If the character limit occurs in the middle of a word, how can I back track to a space so that all my substrings have entire words?";

    int charLimit = 40;
    int arrayLength = 0;
    arrayLength = (int) Math.ceil(((mText.length() / (double) charLimit)));

    String[] result = new String[arrayLength];
    int j = 0;
    int tolerance = 10;
    int splitpoint;
    int lastIndex = result.length - 1;
    for (int i = 0; i < lastIndex; i++) {
        splitpoint = mText.lastIndexOf(' ' ,j+charLimit);
        splitpoint = splitpoint > j+charLimit-tolerance ? splitpoint:j+charLimit;
        result[i] = mText.substring(j, splitpoint);
        j = splitpoint;
    }

    result[lastIndex] = mText.substring(j);

    for (int i = 0; i<arrayLength; i++) {
        System.out.println(result[i]);
    }
}

Output:

I am able to break up paragraphs of text
 into substrings based upon nth given
 character limit. The conflict I have is
 that my algorithm is doing exactly
 this, and is breaking up words. This is
 where I am stuck. If the character
 limit occurs in the middle of a word,
 how can I back track to a space so that
 all my substrings have entire words?

Additional Edit: added trim() as per suggestion by curiosu. It removes whitespace surroundig the string tokens.

Upvotes: 3

Related Questions