Leo E
Leo E

Reputation: 829

Join list of sentences up to max character limit

I have a list in which each item is a sentence. I want to join the items as long as the new combined item does not go over a character limit.

You can join items in a list fairly easily.

x = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
' '.join(x)
>>> 'Alice went to the market. She bought an apple. And she then went to the park.'

Now say I would like to sequentially join the items as long as the new combined item is not greater than 50 characters.

The result would be :

['Alice went to the market. She bought an apple.','And she then went to the park.']

You can maybe do a list comprehension like here. Or I can maybe do a conditional iterator like here. But I run into problems where the sentences get cut off.

Clarifications

Upvotes: 3

Views: 1984

Answers (6)

Chuck Wilbur
Chuck Wilbur

Reputation: 2620

I started from Joe's answer, pulled out the max index with the first_greater_elem method from this answer, and came up with this set of helper methods.

def combine_messages(message_array: List, max_length) -> List:
    lengths = list(map(len, message_array))
    sums = [sum(lengths[:i + 1]) for i in range(len(message_array))]
    max_index = first_greater_elem(sums, max_length)
    if max_index < len(message_array):
        result = [" ".join(message_array[:max_index])]
        result.extend(combine_messages(message_array[max_index:], max_length))
        return result
    return [" ".join(message_array)]


def first_greater_elem(lst, elem):
    for i, item in enumerate(lst):
        if item >= elem:
            return i
    return len(lst)

It recursively continues to combine elements into strings shorter than max_length. So extending your example,

message_array = ['Alice went to the market.', 'She bought an apple.', 'She went to the park.', 'She played.', 'She climbed.', 'She went up the ladder and down the slide.', 'After a while she got tired.', 'So she went home.']

combine_messages(message_array, 50)

['Alice went to the market. She bought an apple.', 'She went to the park. She played. She climbed.', 'She went up the ladder and down the slide.', 'After a while she got tired. So she went home.']

Upvotes: 0

Carl Higgs
Carl Higgs

Reputation: 331

This is a great question; I can see how there can be useful applications for a solution to this problem.

It doesn't look like the above solutions currently deliver the requested answer, at least in a straightforward and robust way. While I'm sure the below function could be optimised, I believe it solves the problem as requested and is simple to understand.

def wrap_sentences(words,limit=50,delimiter=' '):
    sentences = []
    sentence = ''
    gap = len(delimiter)
    for i,word in enumerate(words):
        if i==0:
            sentence=word
            continue
        # combine word to sentence if under limit
        if len(sentence)+gap+len(word)<=limit:
            sentence=sentence+delimiter+word
        else:
            sentences.append(sentence)
            sentence=word
            # append the final word if not yet appended
            if i == len(words)-1:
               sentences.append(sentence)
               
        # finally, append sentence of all words 
        # if it is below limit and not appended
        if (i == len(words)-1) and (sentences==[]):
            sentences.append(sentence)
    
    return sentences

Using it to get the result:

>>> solution = ['Alice went to the market. She bought an apple.', 'And she then went to the park.']
>>> x = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
>>> result = wrap_sentences(x,limit=50,delimiter=' ')
>>> result
['Alice went to the market. She bought an apple.', 'And she then went to the park.']
>>> result==solution
True

The function output evaluates as a match for the poster's desired answer given the same input. Also, if the limit is high and not reached, it still returns the joined sentences.

(edit: some of the terms in my function may seem odd, eg 'words' as the input. Its because I plan to use this function for wrapping Thai words with a no space delimiter across multiple lines; I came across this thread while seeking a simple solution, and decided to apply it to this problem. Hopefully applying this in a general way doesn't detract from the solution!)

Upvotes: 0

Alain T.
Alain T.

Reputation: 42143

You can use accumulate from itertools to compute the size of the accumulated strings (+separators) and determine the maximum number of items that can be combined.

After than you can decide to combine them and you will also know what items could not fit.

s = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']

from itertools import accumulate
maxCount = sum( size+sep<=50 for sep,size in enumerate(accumulate(map(len,s))) )
combined = " ".join(s[:maxCount])
unused   = s[maxCount:]

print(combined,unused)
# Alice went to the market. She bought an apple. ['And she then went to the park.']                    

You could also obtain maxCount in a more brutal (and inefficient) way, without using accumulate:

maxCount = sum(len(" ".join(s[:n+1]))<=50 for n in range(len(s)))

Or you could do the whole thing in one line:

items = next(s[:n] for n in range(len(s),0,-1) if len(" ".join(s[:n]))<=50 )

# ['Alice went to the market.', 'She bought an apple.']

unused = s[len(items):]

# ['And she then went to the park.']

If you need to perform multiple combinations from the list to produce a new list of combined sentences (as per your latest edit to the question), you can use this in a loop:

combined = []
s        = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
while s:
    items = next((s[:n] for n in range(len(s),0,-1) if len(" ".join(s[:n]))<=50), s[:1])
    combined.append(" ".join(items))
    s = s[len(items):]

print(combined)
# ['Alice went to the market. She bought an apple.', 'And she then went to the park.'] 

EDIT Changed call to the next() function to add a default. This will handle sentences that are already longer than 50 characters.

Upvotes: 1

Joe P
Joe P

Reputation: 485

Here's a one-line solution, just because it's possible.

[x[i] for i in range(len(x)) if [sum(list(map(len,x))[:j+1]) for j in range(len(x))][i] < 50]

And here's the same more efficiently - with intermediate results to save recalculation - but still no explicit loops.

lens = list(map(len, x)) 
sums = [sum(lens[:i]) for i in range(len(x))]
[x[i] for i in range(len(x)) if sums < 50]

I doubt this is going to be more efficient than an explicit loop in any realistic case, though!

Upvotes: 3

Dave Liu
Dave Liu

Reputation: 1132

List comprehension would probably be a little less legible, since you want to keep checking total length.

A simple function will do. This one accepts empty joined_str or unspecified as default, but can also start with some specified initial str.

def join_50_chars_or_less(lst, limit=50):
    """
    Takes in lst of strings and returns join of strings
    up to `limit` number of chars (no substrings)

    :param lst: (list)
        list of strings to join
    :param limit: (int)
        optional limit on number of chars, default 50
    :return: (list)
        string elements joined up until length of 50 chars.
        No partial-strings of elements allowed.
    """
    for i in range(len(lst)):
        new_join = lst[:i+1]
        if len(' '.join(new_join)) > limit:
            return lst[:i]
    return lst

After defining the function:

>>> x = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
>>> join_50_chars_or_less(x)
['Alice went to the market.', 'She bought an apple.']
>>> len('Alice went to the market. She bought an apple.')
47

And let's test against a possibly longer string:

>>> test_str = "Alice went to the market. She bought an apple on Saturday."
>>> len(test_str)
58

>>> test = test_str.split()
>>> test
['Alice', 'went', 'to', 'the', 'market.', 'She', 'bought', 'an', 'apple', 'on', 'Saturday.']

>>> join_50_chars_or_less(test)
['Alice', 'went', 'to', 'the', 'market.', 'She', 'bought', 'an', 'apple', 'on']
>>> len(' '.join(join_50_chars_or_less(test)))
>>> 48

Upvotes: 4

minterm
minterm

Reputation: 258

A not-so-elegant solution:

result = []
counter = 0
string = ""
for element in x:
    for char in element:
        if len(string) < 50:
            string.append(char)
        else:
            result.append(string)
            string = ""
if len(string) > 0:
    result.append(string)

Upvotes: 0

Related Questions