Reputation: 3577
I have a string like so:
s = "i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread."
and I am trying to use re.sub to replace all special characters that are not apostrophes between letters with a space, so 'gluten-free' becomes gluten free and i'm will stay as i'm.
I have tried this:
import re
s = re.sub('[^[a-z]+\'?[a-z]+]', ' ', s)
which I am trying to say is to replace anything that is not following the pattern of one and more letters, with then 0 or one apostrophes, followed by one or more letters with white space.
this returns the same string:
i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread.
I would like to have:
i'm sorry sir but this is a gluten free restaurant we don't serve bread
Upvotes: 1
Views: 1015
Reputation: 626802
You can use
import re
s = "i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread."
print( re.sub(r"(?:(?!\b['‘’]\b)[\W_])+", ' ', s).strip() )
# => i'm sorry sir but this is a gluten free restaurant we don't serve bread
See the Python demo and the regex demo.
Details:
(?:
- start of a non-capturing group:
(?!\b['‘’]\b)
- a negative lookahead that fails the match if there is an apostrophe within word chars[\W_]
- a non-word or _
char)+
- one or more occurrencesUpvotes: 0
Reputation: 785128
You may use this regex with a nested lookahead+lookbehind:
>>> s = "i'm sorry, sir, but this is a 'gluten-free' restaurant. we don't serve bread."
>>> print ( re.sub(r"(?!(?<=[a-z])'[a-z])[^\w\s]", ' ', s, flags=re.I) )
i'm sorry sir but this is a gluten free restaurant we don't serve bread
RegEx Details:
(?!
: Start negative lookahead
(?<=[a-z])
: Positive lookbehind to assert that we have an alphabet at previous position'
: Match an apostrophe[a-z]
: Match letter [a-z]
)
: End negative lookahead[^\w\s]
: Match a character that is not a whitespace and not a word characterUpvotes: 2