Reputation: 101
I wrote the following program extracting all the patterns (words with possible hyphens, punctuation marks)
sentence="Narrow-minded people are happy although it's cold ! I'm also happy"
print(re.split('([^-\w])',sentence))
The result is :
['Narrow-minded', ' ', 'people', ' ', 'are', ' ', 'happy', ' ', 'although', ' ', 'it', "'", 's', ' ', 'cold', ' ', '', '!', '', ' ', 'I', "'", 'm', ' ', 'also', ' ', 'happy']
The question is how to consider (add) the apostrophe at end of a word. For example: we would like to retrieve "it'"
instead of the couple "it", "'"
.
Upvotes: 1
Views: 765
Reputation: 2747
You can add words ending with an apostrophe as a special case:
print(re.split('([\w-]+\'|[^-\w])',sentence))
in this case, the sentence is split on either
\w
-characters followed by an apostrophe (the [\w-]+\'
part\w
-character (the [^-\w]
part)This results in:
['Narrow-minded', ' ', 'people', ' ', 'are', ' ', 'happy', ' ', 'although', ' ', '', "it'", 's', ' ', 'cold', ' ', '', '!', '', ' ', '', "I'", 'm', ' ', 'also', ' ', 'happy']
Note that this does increase the number of empty strings (''
) in the list, to get rid of those you can filter the list:
print(filter(None, re.split('([\w-]+\'|[^-\w])',sentence)))
which results in:
['Narrow-minded', ' ', 'people', ' ', 'are', ' ', 'happy', ' ', 'although', ' ', "it'", 's', ' ', 'cold', ' ', '!', ' ', "I'", 'm', ' ', 'also', ' ', 'happy']
Upvotes: 2