Reputation: 5576
I want to use Python to count the numbers of words that occur between certain punctuation characters in a block of text input. For example, such an analysis of everything written up to this point might be represented as:
[23, 2, 14]
...because the first sentence, which has no punctuation except the period at the end, has 23 words, the "For example" phrase that comes next has two, and the rest, ending with the colon, has 14.
This probably wouldn't be too hard to make, but (to go along with the "don't reinvent the wheel" philosophy that seems especially Pythonic) is there anything already out there that would be especially suitable for the task?
Upvotes: 1
Views: 1289
Reputation: 25964
Joran beat me to it, but I'll add my approach:
from string import punctuation
import re
s = 'I want to use Python to count the numbers of words that occur between certain punctuation characters in a block of text input. For example, such an analysis of everything written up to this point might be represented as'
gen = (x.split() for x in re.split('[' + punctuation + ']',s))
list(map(len,gen))
Out[32]: [23, 2, 14]
(I love map
)
Upvotes: 4
Reputation: 113998
punctuation_i_care_about="?.!"
split_by_punc = re.split("[%s]"%punctuation_i_care_about, some_big_block_of_text)
words_by_puct = [len(x.split()) for x in split_by_punc]
Upvotes: 4