Reputation: 10606
I'd like to take a block of text and extract as many words as possible from a given amount of characters. What tools/libraries can I use to accomplish this?
For example, in the given block of text:
Have you managed to get your hands on Nikon's elusive D4 full-frame DSLR?
It should be smooth sailing from here, with the occasional firmware update being
your only critical acquisition going forward. D4 firmware 1.02 brings a handful of
minor fixes, but if you're in need of any of the enhancements listed below, it's
surely a must have:
If I were to assign that to a string, and then make string = string[0:100]
, that would get the first 100 characters, but the word 'sailing' would be cut off to 'sailin', and I'd like for the text to be cut off right before or after the space before 'sailing'.
Upvotes: 1
Views: 157
Reputation: 33397
Using regex:
>>> re.match(r'(.{,100})\W', text).group(1)
"Have you managed to get your hands on Nikon's elusive D4 full-frame DSLR? It should be smooth"
This approach lets you search for any punctuation (not only spaces) between words. It will match 100 or less chars.
To deal with small strings, the following regex is better:
re.match(r'(.{,100})(\W|$)', text).group(1)
Upvotes: 3
Reputation: 39451
This will cut it off at the last space in the first 100 characters, if any.
lastSpace = string[:100].rfind(' ')
string = string[:lastSpace] if (lastSpace != -1) else string[:100]
Upvotes: 0
Reputation: 375604
If you really want to just break the string on spaces, then use this:
my_string = my_string[:100].rsplit(None, 1)[0]
But keep in mind, you might actually want to splut on more than just spaces.
Upvotes: 1