Sojimanatsu
Sojimanatsu

Reputation: 601

Splitting a sentence below a specific "space" character with regex in Python

I have been trying to solve a problem for splitting a sentence down to a meaningful set of words under specific length.

string1 = "Alice is in wonderland"
string2 = "Bob is playing games on his computer"

I want to have a regex that matches the representative words that match the condition of being lower than 20 characters.

new_string1 = "Alice is in"
new_string2 = "Bob is playing games"

Is this possible to do it with Regex?

Upvotes: 1

Views: 80

Answers (1)

Olivier Melançon
Olivier Melançon

Reputation: 22294

This is not a good usecase of regular expression. Although, the textwrap.shorten method achieves exactly that.

import textwrap

string1 = "Alice is in wonderland"
string2 = "Bob is playing games on his computer"

new_string1 = textwrap.shorten(string1, 20, placeholder="")
new_string2 = textwrap.shorten(string2, 20, placeholder="")

print(new_string1) # Alice is in
print(new_string2) # Bob is playing games

The only downside of textwrap.shorten is that it collapses spaces. In the event you do not want that to happen, you can implement your own method.

def shorten(s, max_chars):
    # Special case is the string is shorter than the number of required chars
    if len(s) <= max_chars:
        return s.rstrip()

    stop = 0
    for i in range(max_chars + 1):
        # Always keep the location of the last space behind the pointer
        if s[i].isspace():
            stop = i

    # Get rid of possible extra space added on the tail of the string
    return s[:stop].rstrip()

string1 = "Alice is in wonderland"
string2 = "Bob is playing games on his computer"

new_string1 = shorten(string1, 20)
new_string2 = shorten(string2, 20)

print(new_string1) # Alice is in
print(new_string2) # Bob is playing games

Upvotes: 1

Related Questions