Reputation: 37
Basically, I have a string that has multiple double-whitespaces like this:
"Some text\s\sWhy is there no punctuation\s\s"
I also have a list of punctuation marks that should replace the double-whitespaces, so that the output would be this:
puncts = ['.', '?']
# applying some function
# output:
>>> "Some text. Why is there no punctuation?"
I have tried re.sub(' +', puncts[i], text)
but my problem here is that I don't know how to properly iterate through the list and replace the 1st double-whitespace with the 1st element in puncts, the 2nd double-whitespace with the 2nd element in puncts and so on.
Upvotes: 1
Views: 72
Reputation: 42143
You can use re.split() to break the string into substrings between the double spaces and intersperse the punctuation marks using join:
import re
string = "Some text Why is there no punctuation "
iPunct = iter([". ","? "])
result = "".join(x+next(iPunct,"") for x in re.split(r"\s\s",string))
print(result)
# Some text. Why is there no punctuation?
Upvotes: 0
Reputation: 915
If we're still using re.sub(), here's one possible solution that follows this basic pattern:
text
.puncts = ['.', '?']
text = "Some text Why is there no punctuation "
for i in puncts:
text = re.sub('\s(?=\s)', i, text, 1)
The call to re.sub() returns a string, and basically says "find all series of two whitespace characters, but only replace the first whitespace character with a punctuation character." The final argument "1" makes it so that we only replace the first instance of the double whitespace, and not all of them (default behavior).
If the positive lookahead (the part of the regex that we want to match but not replace) confuses you, you can also do without it:
puncts = ['.', '?']
text = "Some text Why is there no punctuation "
for i in puncts:
text = re.sub('\s\s', i + " ", text, 1)
This yields the same output.
There will be a leftover whitespace at the end of the sentence, but if you're stingy about that, a simple text.rstrip()
should take care of that one.
Further explanation
Your first try of using regex ' +'
doesn't work because that regex matches all instances where there is at least one whitespace — that is, it will match everything, and then also replace all of it with a punctuation character. The above solutions account for the double-whitespace in their respective regexes.
Upvotes: 1
Reputation: 5414
You can do it simply using the replace
method!
text = "Some text Why is there no punctuation "
puncts = ['.', '?']
for i in puncts:
text = text.replace(" ", i, 1) #notice the 1 here
print(text)
Output : Some text.Why is there no punctuation?
Upvotes: 0