AliAvci
AliAvci

Reputation: 1197

Split a paragraph into a list of strings, each not exceeding a given size and avoiding splitting words in half

Question

How can the following be done in an idiomatic way: Split a large String into a list of Strings, each not exceeding the given size, and avoiding splitting words in half.

Closest solution with String.chunked() (Splits words)

The closest solution to this is using the String class's chunked() method. However, the problem with this is that it splits words in the given String.

Code example of use of String.chunked()

val longString = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod " +
    "tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, " +
    "quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo " +
    "consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse " +
    "cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non " +
    "proident, sunt in culpa qui officia deserunt mollit anim id est laborum. "

// Split [longString] into list
var listOfStrings = longString.chunked(40)
listOfStrings.forEach {
    println(it)
}

Example output of closest example with String.chunked()

Below is the output received by running the example code provided. As can be seen, the words are split at the end of the lines.

Lorem ipsum dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna ali
qua. Ut enim ad minim veniam, quis nostr
ud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis au
te irure dolor in reprehenderit in volup
tate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qu
i officia deserunt mollit anim id est la
borum.

Upvotes: 4

Views: 1064

Answers (2)

Lino
Lino

Reputation: 19926

You could use this simple helper function:

fun splitIntoChunks(max: Int, string: String): List<String> = ArrayList<String>(string.length / max + 1).also {
    var firstWord = true
    val builder = StringBuilder()

    // split string by whitespace
    for (word in string.split(Regex("( |\n|\r|\n\r)+"))) {
        // if the current string exceeds the max size
        if (builder.length + word.length > max) {
            // then we add the string to the list and clear the builder
            it.add(builder.toString())
            builder.setLength(0)
            firstWord = true
        }
        // append a space at the beginning of each word, except the first one
        if (firstWord) firstWord = false else builder.append(' ')
        builder.append(word)
    }

    // add the last collected part if there was any
    if(builder.isNotEmpty()){
        it.add(builder.toString())
    }
}

Which then can be called simply like this:

val chunks: List<String> = splitIntoChunks(20, longString)

Upvotes: 1

Roland
Roland

Reputation: 23262

Not really the most idiomatic way I found, but maybe it suffices your needs:

fun String.chunkedWords(limitChars: Int,
                        delimiter: Char = ' ',
                        joinCharacter: Char = '\n') =
    splitToSequence(delimiter)
        .reduce { cumulatedString, word ->
          val exceedsSize = cumulatedString.length - cumulatedString.indexOfLast { it == joinCharacter } + "$delimiter$word".length > limitChars
          cumulatedString + if (exceedsSize) {
            joinCharacter
          } else {
            delimiter
          } + word
        }

You can then use it as follows:

longText.chunkedWords(40).run(::println)

which for your given string would then print:

Lorem ipsum dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris
nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit
in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint
occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim
id est laborum.

You could also split it to lines from there, e.g. longText.chunkedWords(40).splitAsSequence("\n"). Note that it also splits nicely if there are already new-line characters in the string, i.e. if you have a String like "Testing shorter lines.\nAnd now there comes a very long line" a call of .chunkedWords(17) will produce the following output:

Testing shorter
lines.
And now there   // this tries to use the whole 17 characters again
comes a very
long line

Upvotes: 5

Related Questions