N_B
N_B

Reputation: 311

Retaining punctuations in a word

How can i remove punctuations from a line, but retain punctuation in the word using re ??

For Example :

Input = "Hello!!!, i don't like to 'some String' .... isn't"
Output = (['hello','i', 'don't','like','to', 'some', 'string', 'isn't'])

I am trying to do this:

re.sub('\W+', ' ', myLine.lower()).split()

But this is splitting the words like "don't" into don and t.

Upvotes: 1

Views: 1158

Answers (1)

anubhava
anubhava

Reputation: 785068

You can use lookarounds in your regex:

>>> input = "Hello!!!, i didn''''t don't like to 'some String' .... isn't"
>>> regex = r'\W+(?!\S*[a-z])|(?<!\S)\W+'
>>> print re.sub(regex, '', input, 0, re.IGNORECASE).split()
['Hello', 'i', "didn''''t", "don't", 'like', 'to', 'some', 'String', "isn't"]

RegEx Demo

\W+(?!\S*[a-z])|(?<!\S)\W+ matches a non-word, non-space character that doesn't have a letter at previous position or a letter at next position after 1 or more non-space characters.

Upvotes: 1

Related Questions