Rizou34343
Rizou34343

Reputation: 71

Python Regex for End of Line

I am trying to write a regex which adds a space before and after a dot. However I only want this if there is a space or end of line after the dot.

However I am unable to do so for end of line cases.

Eg.

I want a hotel. >> I want a hotel .
my email is [email protected] >> my email is [email protected]
I have to play. bye! >> I have to play . bye!

Following is my code:

# If "Dot and space" after word or number put space before and after
utterance = re.sub(r'(?<=[a-z0-9])[.][ $]',' . ',utterance)

How do I correct my regex to make sure my 1st example above also works, I tried putting a $ sign in square bracket but that doesn't work.

Upvotes: 7

Views: 27721

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627469

The main issue is that $ inside a character class denotes a literal $ symbol, you just need a grouping construct here.

I suggest using the following code:

import re
regex = r"([^\W_])\.(?:\s+|$)"
ss = ["I want a hotel.","my email is [email protected]", "I have to play. bye!"]
for s in ss:
    result = re.sub(regex, r"\1 . ", s).rstrip()
    print(result)

See the Python demo.

If you need to apply this on lines only without affecting line breaks, you can use

import re
regex = r"([^\W_])\.(?:[^\S\n\r]+|$)"
text = "I want a hotel.\nmy email is [email protected]\nI have to play. bye!"
print( re.sub(regex, r"\1 . ", text, flags=re.M).rstrip() )

See this Python demo.

Output:

I want a hotel . 
my email is [email protected]
I have to play . bye!

Details:

  • ([^\W_]) - Group 1 matching any letter or digit
  • \. - a literal dot
  • (?:\s+|$) - a grouping matching either 1+ whitespaces or end of string anchor (here, $ matches the end of string.)

The rstrip will remove the trailing space added during replacement.

If you are using Python 3, the [^\W_] will match all Unicode letters and digits by default. In Python 2, re.U flag will enable this behavior.

Note that \s+ in the last (?:\s+|$) will "shrink" multiple whitespaces into 1 space.

Upvotes: 3

Fred
Fred

Reputation: 76

Use the lookahead assertion (?=) to find a . followed by space or end of line \n:

utterance = re.sub('\\.(?= )|\\.(?=\n)', ' . ', utterance )

Upvotes: 2

Błotosmętek
Błotosmętek

Reputation: 12927

[ $] defines a class of characters consisting of a space and a dollar sign, so it matches on space or dollar (literally). To match on space or end of line, use ( |$) (in this case, $ keeps it special meaning.

Upvotes: 1

Related Questions