edrftg21
edrftg21

Reputation: 37

Replace substrings with items from list

Basically, I have a string that has multiple double-whitespaces like this:

"Some text\s\sWhy is there no punctuation\s\s"

I also have a list of punctuation marks that should replace the double-whitespaces, so that the output would be this:

puncts = ['.', '?']

# applying some function
# output:
>>> "Some text. Why is there no punctuation?"

I have tried re.sub(' +', puncts[i], text) but my problem here is that I don't know how to properly iterate through the list and replace the 1st double-whitespace with the 1st element in puncts, the 2nd double-whitespace with the 2nd element in puncts and so on.

Upvotes: 1

Views: 72

Answers (3)

Alain T.
Alain T.

Reputation: 42143

You can use re.split() to break the string into substrings between the double spaces and intersperse the punctuation marks using join:

import re
string = "Some text  Why is there no punctuation  "
iPunct = iter([". ","? "])
result = "".join(x+next(iPunct,"") for x in re.split(r"\s\s",string))
print(result)
# Some text. Why is there no punctuation?

Upvotes: 0

chang_trenton
chang_trenton

Reputation: 915

If we're still using re.sub(), here's one possible solution that follows this basic pattern:

  1. Get the next punctuation character.
  2. Replace only the first occurrence of that character in text.
puncts = ['.', '?']
text = "Some text  Why is there no punctuation  "
for i in puncts:
     text = re.sub('\s(?=\s)', i, text, 1)

The call to re.sub() returns a string, and basically says "find all series of two whitespace characters, but only replace the first whitespace character with a punctuation character." The final argument "1" makes it so that we only replace the first instance of the double whitespace, and not all of them (default behavior).

If the positive lookahead (the part of the regex that we want to match but not replace) confuses you, you can also do without it:

puncts = ['.', '?']
text = "Some text  Why is there no punctuation  "
for i in puncts:
     text = re.sub('\s\s', i + " ", text, 1)

This yields the same output.

There will be a leftover whitespace at the end of the sentence, but if you're stingy about that, a simple text.rstrip() should take care of that one.

Further explanation Your first try of using regex ' +' doesn't work because that regex matches all instances where there is at least one whitespace — that is, it will match everything, and then also replace all of it with a punctuation character. The above solutions account for the double-whitespace in their respective regexes.

Upvotes: 1

Taohidul Islam
Taohidul Islam

Reputation: 5414

You can do it simply using the replace method!

text = "Some text  Why is there no punctuation  "
puncts = ['.', '?']

for i in puncts:
    text = text.replace("  ", i, 1) #notice the 1 here

print(text)

Output : Some text.Why is there no punctuation?

Upvotes: 0

Related Questions