898247
898247

Reputation: 10606

Extracting all full words between in a certain amount of characters

I'd like to take a block of text and extract as many words as possible from a given amount of characters. What tools/libraries can I use to accomplish this?

For example, in the given block of text:

Have you managed to get your hands on Nikon's elusive D4 full-frame DSLR? 
It should be smooth sailing from here, with the occasional firmware update being 
your only critical acquisition going forward. D4 firmware 1.02 brings a handful of 
minor fixes, but if you're in need of any of the enhancements listed below, it's 
surely a must have:

If I were to assign that to a string, and then make string = string[0:100], that would get the first 100 characters, but the word 'sailing' would be cut off to 'sailin', and I'd like for the text to be cut off right before or after the space before 'sailing'.

Upvotes: 1

Views: 157

Answers (3)

JBernardo
JBernardo

Reputation: 33397

Using regex:

>>> re.match(r'(.{,100})\W', text).group(1)
"Have you managed to get your hands on Nikon's elusive D4 full-frame DSLR? It should be smooth"

This approach lets you search for any punctuation (not only spaces) between words. It will match 100 or less chars.

To deal with small strings, the following regex is better:

re.match(r'(.{,100})(\W|$)', text).group(1)

Upvotes: 3

Antimony
Antimony

Reputation: 39451

This will cut it off at the last space in the first 100 characters, if any.

lastSpace = string[:100].rfind(' ')
string = string[:lastSpace] if (lastSpace != -1) else string[:100]

Upvotes: 0

Ned Batchelder
Ned Batchelder

Reputation: 375604

If you really want to just break the string on spaces, then use this:

my_string = my_string[:100].rsplit(None, 1)[0]

But keep in mind, you might actually want to splut on more than just spaces.

Upvotes: 1

Related Questions