Reputation: 71
I am trying to write a regex which adds a space before and after a dot. However I only want this if there is a space or end of line after the dot.
However I am unable to do so for end of line cases.
Eg.
I want a hotel. >> I want a hotel .
my email is [email protected] >> my email is [email protected]
I have to play. bye! >> I have to play . bye!
Following is my code:
# If "Dot and space" after word or number put space before and after
utterance = re.sub(r'(?<=[a-z0-9])[.][ $]',' . ',utterance)
How do I correct my regex to make sure my 1st example above also works, I tried putting a $ sign in square bracket but that doesn't work.
Upvotes: 7
Views: 27721
Reputation: 627469
The main issue is that $
inside a character class denotes a literal $
symbol, you just need a grouping construct here.
I suggest using the following code:
import re
regex = r"([^\W_])\.(?:\s+|$)"
ss = ["I want a hotel.","my email is [email protected]", "I have to play. bye!"]
for s in ss:
result = re.sub(regex, r"\1 . ", s).rstrip()
print(result)
See the Python demo.
If you need to apply this on lines only without affecting line breaks, you can use
import re
regex = r"([^\W_])\.(?:[^\S\n\r]+|$)"
text = "I want a hotel.\nmy email is [email protected]\nI have to play. bye!"
print( re.sub(regex, r"\1 . ", text, flags=re.M).rstrip() )
See this Python demo.
Output:
I want a hotel .
my email is [email protected]
I have to play . bye!
Details:
([^\W_])
- Group 1 matching any letter or digit\.
- a literal dot(?:\s+|$)
- a grouping matching either 1+ whitespaces or end of string anchor (here, $
matches the end of string.)The rstrip
will remove the trailing space added during replacement.
If you are using Python 3, the [^\W_]
will match all Unicode letters and digits by default. In Python 2, re.U
flag will enable this behavior.
Note that \s+
in the last (?:\s+|$)
will "shrink" multiple whitespaces into 1 space.
Upvotes: 3
Reputation: 76
Use the lookahead assertion (?=)
to find a .
followed by space or end of line \n
:
utterance = re.sub('\\.(?= )|\\.(?=\n)', ' . ', utterance )
Upvotes: 2
Reputation: 12927
[ $]
defines a class of characters consisting of a space and a dollar sign, so it matches on space or dollar (literally). To match on space or end of line, use ( |$)
(in this case, $
keeps it special meaning.
Upvotes: 1