Reputation: 829
I have a list in which each item is a sentence. I want to join the items as long as the new combined item does not go over a character limit.
You can join items in a list fairly easily.
x = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
' '.join(x)
>>> 'Alice went to the market. She bought an apple. And she then went to the park.'
Now say I would like to sequentially join the items as long as the new combined item is not greater than 50 characters.
The result would be :
['Alice went to the market. She bought an apple.','And she then went to the park.']
You can maybe do a list comprehension like here. Or I can maybe do a conditional iterator like here. But I run into problems where the sentences get cut off.
Clarifications
['Alice went to the market. She bought an apple.','And she then went to the park.']
Upvotes: 3
Views: 1984
Reputation: 2620
I started from Joe's answer, pulled out the max index with the first_greater_elem
method from this answer, and came up with this set of helper methods.
def combine_messages(message_array: List, max_length) -> List:
lengths = list(map(len, message_array))
sums = [sum(lengths[:i + 1]) for i in range(len(message_array))]
max_index = first_greater_elem(sums, max_length)
if max_index < len(message_array):
result = [" ".join(message_array[:max_index])]
result.extend(combine_messages(message_array[max_index:], max_length))
return result
return [" ".join(message_array)]
def first_greater_elem(lst, elem):
for i, item in enumerate(lst):
if item >= elem:
return i
return len(lst)
It recursively continues to combine elements into strings shorter than max_length
. So extending your example,
message_array = ['Alice went to the market.', 'She bought an apple.', 'She went to the park.', 'She played.', 'She climbed.', 'She went up the ladder and down the slide.', 'After a while she got tired.', 'So she went home.']
combine_messages(message_array, 50)
['Alice went to the market. She bought an apple.', 'She went to the park. She played. She climbed.', 'She went up the ladder and down the slide.', 'After a while she got tired. So she went home.']
Upvotes: 0
Reputation: 331
This is a great question; I can see how there can be useful applications for a solution to this problem.
It doesn't look like the above solutions currently deliver the requested answer, at least in a straightforward and robust way. While I'm sure the below function could be optimised, I believe it solves the problem as requested and is simple to understand.
def wrap_sentences(words,limit=50,delimiter=' '):
sentences = []
sentence = ''
gap = len(delimiter)
for i,word in enumerate(words):
if i==0:
sentence=word
continue
# combine word to sentence if under limit
if len(sentence)+gap+len(word)<=limit:
sentence=sentence+delimiter+word
else:
sentences.append(sentence)
sentence=word
# append the final word if not yet appended
if i == len(words)-1:
sentences.append(sentence)
# finally, append sentence of all words
# if it is below limit and not appended
if (i == len(words)-1) and (sentences==[]):
sentences.append(sentence)
return sentences
Using it to get the result:
>>> solution = ['Alice went to the market. She bought an apple.', 'And she then went to the park.']
>>> x = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
>>> result = wrap_sentences(x,limit=50,delimiter=' ')
>>> result
['Alice went to the market. She bought an apple.', 'And she then went to the park.']
>>> result==solution
True
The function output evaluates as a match for the poster's desired answer given the same input. Also, if the limit is high and not reached, it still returns the joined sentences.
(edit: some of the terms in my function may seem odd, eg 'words' as the input. Its because I plan to use this function for wrapping Thai words with a no space delimiter across multiple lines; I came across this thread while seeking a simple solution, and decided to apply it to this problem. Hopefully applying this in a general way doesn't detract from the solution!)
Upvotes: 0
Reputation: 42143
You can use accumulate from itertools to compute the size of the accumulated strings (+separators) and determine the maximum number of items that can be combined.
After than you can decide to combine them and you will also know what items could not fit.
s = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
from itertools import accumulate
maxCount = sum( size+sep<=50 for sep,size in enumerate(accumulate(map(len,s))) )
combined = " ".join(s[:maxCount])
unused = s[maxCount:]
print(combined,unused)
# Alice went to the market. She bought an apple. ['And she then went to the park.']
You could also obtain maxCount in a more brutal (and inefficient) way, without using accumulate:
maxCount = sum(len(" ".join(s[:n+1]))<=50 for n in range(len(s)))
Or you could do the whole thing in one line:
items = next(s[:n] for n in range(len(s),0,-1) if len(" ".join(s[:n]))<=50 )
# ['Alice went to the market.', 'She bought an apple.']
unused = s[len(items):]
# ['And she then went to the park.']
If you need to perform multiple combinations from the list to produce a new list of combined sentences (as per your latest edit to the question), you can use this in a loop:
combined = []
s = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
while s:
items = next((s[:n] for n in range(len(s),0,-1) if len(" ".join(s[:n]))<=50), s[:1])
combined.append(" ".join(items))
s = s[len(items):]
print(combined)
# ['Alice went to the market. She bought an apple.', 'And she then went to the park.']
EDIT Changed call to the next() function to add a default. This will handle sentences that are already longer than 50 characters.
Upvotes: 1
Reputation: 485
Here's a one-line solution, just because it's possible.
[x[i] for i in range(len(x)) if [sum(list(map(len,x))[:j+1]) for j in range(len(x))][i] < 50]
And here's the same more efficiently - with intermediate results to save recalculation - but still no explicit loops.
lens = list(map(len, x))
sums = [sum(lens[:i]) for i in range(len(x))]
[x[i] for i in range(len(x)) if sums < 50]
I doubt this is going to be more efficient than an explicit loop in any realistic case, though!
Upvotes: 3
Reputation: 1132
List comprehension would probably be a little less legible, since you want to keep checking total length.
A simple function will do. This one accepts empty joined_str
or unspecified as default, but can also start with some specified initial str
.
def join_50_chars_or_less(lst, limit=50):
"""
Takes in lst of strings and returns join of strings
up to `limit` number of chars (no substrings)
:param lst: (list)
list of strings to join
:param limit: (int)
optional limit on number of chars, default 50
:return: (list)
string elements joined up until length of 50 chars.
No partial-strings of elements allowed.
"""
for i in range(len(lst)):
new_join = lst[:i+1]
if len(' '.join(new_join)) > limit:
return lst[:i]
return lst
After defining the function:
>>> x = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
>>> join_50_chars_or_less(x)
['Alice went to the market.', 'She bought an apple.']
>>> len('Alice went to the market. She bought an apple.')
47
And let's test against a possibly longer string:
>>> test_str = "Alice went to the market. She bought an apple on Saturday."
>>> len(test_str)
58
>>> test = test_str.split()
>>> test
['Alice', 'went', 'to', 'the', 'market.', 'She', 'bought', 'an', 'apple', 'on', 'Saturday.']
>>> join_50_chars_or_less(test)
['Alice', 'went', 'to', 'the', 'market.', 'She', 'bought', 'an', 'apple', 'on']
>>> len(' '.join(join_50_chars_or_less(test)))
>>> 48
Upvotes: 4
Reputation: 258
A not-so-elegant solution:
result = []
counter = 0
string = ""
for element in x:
for char in element:
if len(string) < 50:
string.append(char)
else:
result.append(string)
string = ""
if len(string) > 0:
result.append(string)
Upvotes: 0