Oliver
Oliver

Reputation: 101

Python: Removing spaces between punctuation with positive lookahead

I am trying to remove the spaces that occur between punctuation characters in a sentence. To illustrate, the dataset has many strings that look like this:

 "This is a very nice text : ) : ) ! ! ! ."

But I want them to look like this:

 "This is a very nice text :):)!!!."

I want to do this by using a RegEx positive lookahead, but can someone show me how to do this in Python. I now have code but it does exactly the opposite of what I want by adding extra spaces:

 string = re.sub('([.,!?()])', r' \1', string)

Upvotes: 0

Views: 2710

Answers (2)

In principle you could find the space (spaces?) between punctuation characters (that you capture) and substitute the captured punctuation characters only:

string = re.sub('([:.,!?()]) ([:.,!?()])', r'\1\2', string)

However, this would result in

This is a very nice text :) :) !! !.

since re.sub does not consider overlapping matches.


Hence, you need to use the zero-width look-ahead and look-behind - they are not counted into the match, so the matched portion is just the space character, that we then substitute to an empty string.

string = re.sub('(?<=[:.,!?()]) (?=[:.,!?()])', '', string)

with which the result is 'This is a very nice text :):)!!!.'

Upvotes: 4

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476584

You could use a regex like:

(?<=[.:,!?()])\s+(?=[.:,!?()])

Here the two parts between brackets are look behind and look aheads, that look for punctuations. We then match the \s+ (one or more spaces part). We can then replace this with the empty string. For example:

import re

rgx = re.compile(r'(?<=[.:,!?()])\s+(?=[.:,!?()])')

rgx.sub('', 'This is a very nice text : ) : ) ! ! ! .')

This then produces:

>>> rgx.sub('', 'This is a very nice text : ) : ) ! ! ! .')
'This is a very nice text :):)!!!.'

Upvotes: 2

Related Questions