Limon
Limon

Reputation: 993

Punctuation correction with regex

I want to make a regex that will fix various grammatical errors with punctuation. There's only a few simple requirements:

So far I got this:

(?:\s*)([?!.,]+)(?:\s*) 

Substituted with \1. This fixes point 1 and 2, but it adds spaces between punctuation as well.

I tried running another regex just to fix point 3:

[!?.,]( )[!?,.]

but this also removes the punctuation marks themselves even though they are not part of any capture group?

Example behavior:

Input: "what! is .this this,gdjs gf fg fddsf . . ."

Desired output: "what! is. this this, gdjs gf fg fddsf..."

Upvotes: 1

Views: 1089

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626758

You need to match multiple punctuation symbols together with whitespace and then remove the whitespace inbetween punctuation symbols within a lambda:

import re

fix_spaces = re.compile(r'\s*([?!.,]+(?:\s+[?!.,]+)*)\s*')
text = "what! is .this this,gdjs gf fg fddsf . . ."
text = fix_spaces.sub(lambda x: "{} ".format(x.group(1).replace(" ", "")), text)
print(text.strip())

See IDEONE demo.

You may use a regex inside the lambda to remove whitespace, too:

re.sub(r"\s+", "", x.group(1))

The regex matches:

  • \s* - leading whitespace (zero or more)
  • ([?!.,]+(?:\s+[?!.,]+)*) - Group 1 matching one or more characters from [?!.,] set, followed with zero or more groups of one or more whitespacees followed with one or more punctuation from the [?!.,] set
  • \s* - zero or more trailing whitespace.

Upvotes: 3

user2705585
user2705585

Reputation:

Based on the information you provided which lacked any specific flavor I came up with following solution.

Regex: /(?<=[A-Za-z])[?!.,]+(?= )/g

Explanation:

1) [?!.,]+(?= ) matches one or more punctuation followed by a space.

2) (?<=[A-Za-z]) the matched punctuation should be preceded by at least one letter.

Regex101 Demo

Upvotes: 0

Related Questions