Reputation: 993
I want to make a regex that will fix various grammatical errors with punctuation. There's only a few simple requirements:
So far I got this:
(?:\s*)([?!.,]+)(?:\s*)
Substituted with \1. This fixes point 1 and 2, but it adds spaces between punctuation as well.
I tried running another regex just to fix point 3:
[!?.,]( )[!?,.]
but this also removes the punctuation marks themselves even though they are not part of any capture group?
Example behavior:
Input: "what! is .this this,gdjs gf fg fddsf . . ."
Desired output: "what! is. this this, gdjs gf fg fddsf..."
Upvotes: 1
Views: 1089
Reputation: 626758
You need to match multiple punctuation symbols together with whitespace and then remove the whitespace inbetween punctuation symbols within a lambda:
import re
fix_spaces = re.compile(r'\s*([?!.,]+(?:\s+[?!.,]+)*)\s*')
text = "what! is .this this,gdjs gf fg fddsf . . ."
text = fix_spaces.sub(lambda x: "{} ".format(x.group(1).replace(" ", "")), text)
print(text.strip())
See IDEONE demo.
You may use a regex inside the lambda to remove whitespace, too:
re.sub(r"\s+", "", x.group(1))
The regex matches:
\s*
- leading whitespace (zero or more)([?!.,]+(?:\s+[?!.,]+)*)
- Group 1 matching one or more characters from [?!.,]
set, followed with zero or more groups of one or more whitespacees followed with one or more punctuation from the [?!.,]
set\s*
- zero or more trailing whitespace.Upvotes: 3
Reputation:
Based on the information you provided which lacked any specific flavor I came up with following solution.
Regex: /(?<=[A-Za-z])[?!.,]+(?= )/g
Explanation:
1) [?!.,]+(?= )
matches one or more punctuation followed by a space.
2) (?<=[A-Za-z])
the matched punctuation should be preceded by at least one letter.
Upvotes: 0