Regex: Remove all special characters that are not apostrophes between letters

Question

I have a string like so:

s = "i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread."

and I am trying to use re.sub to replace all special characters that are not apostrophes between letters with a space, so 'gluten-free' becomes gluten free and i'm will stay as i'm.

I have tried this:

import re

s = re.sub('[^[a-z]+\'?[a-z]+]', ' ', s)

which I am trying to say is to replace anything that is not following the pattern of one and more letters, with then 0 or one apostrophes, followed by one or more letters with white space.

this returns the same string:

i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread.

I would like to have:

i'm sorry  sir  but this is a  gluten free  restaurant  we don't serve bread

anubhava · Accepted Answer

You may use this regex with a nested lookahead+lookbehind:

>>> s = "i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread."
>>> print ( re.sub(r"(?!(?<=[a-z])'[a-z])[^\w\s]", ' ', s, flags=re.I) )
i'm sorry  sir  but this is a  gluten free  restaurant  we don't serve bread

RegEx Demo

RegEx Details:

(?!: Start negative lookahead
- (?<=[a-z]): Positive lookbehind to assert that we have an alphabet at previous position
- ': Match an apostrophe
- [a-z]: Match letter [a-z]
): End negative lookahead
[^\w\s]: Match a character that is not a whitespace and not a word character

Regex: Remove all special characters that are not apostrophes between letters

Answers (2)

Related Questions