Hana
Hana

Reputation: 101

Extracting words with apostrophe as final possible letter

I wrote the following program extracting all the patterns (words with possible hyphens, punctuation marks)

sentence="Narrow-minded people are happy although it's cold ! I'm also happy" 
print(re.split('([^-\w])',sentence))

The result is :

['Narrow-minded', ' ', 'people', ' ', 'are', ' ', 'happy', ' ', 'although', ' ', 'it', "'", 's', ' ', 'cold', ' ', '', '!', '', ' ', 'I', "'", 'm', ' ', 'also', ' ', 'happy']

The question is how to consider (add) the apostrophe at end of a word. For example: we would like to retrieve "it'" instead of the couple "it", "'".

Upvotes: 1

Views: 765

Answers (1)

ebo
ebo

Reputation: 2747

You can add words ending with an apostrophe as a special case:

print(re.split('([\w-]+\'|[^-\w])',sentence))

in this case, the sentence is split on either

  • a sequence of one or more \w-characters followed by an apostrophe (the [\w-]+\' part
  • OR any character which is not a dash or a \w-character (the [^-\w] part)

This results in:

['Narrow-minded', ' ', 'people', ' ', 'are', ' ', 'happy', ' ', 'although', ' ', '', "it'", 's', ' ', 'cold', ' ', '', '!', '', ' ', '', "I'", 'm', ' ', 'also', ' ', 'happy']

Note that this does increase the number of empty strings ('') in the list, to get rid of those you can filter the list:

print(filter(None, re.split('([\w-]+\'|[^-\w])',sentence))) 

which results in:

['Narrow-minded', ' ', 'people', ' ', 'are', ' ', 'happy', ' ', 'although', ' ', "it'", 's', ' ', 'cold', ' ', '!', ' ', "I'", 'm', ' ', 'also', ' ', 'happy']

Upvotes: 2

Related Questions