MG Fern
MG Fern

Reputation: 119

Regular expression to remove two line breaks conditionally

I'm trying to remove two line breaks character at the end of each line in a text file, but only if it follows a lowercase letter, i.e. [a-z]. I have the following code:

import re
text = "I want this \n\nwith one line break \n\nIn this place \nNothing happens \n\nHere\nnothing should\nhappen too."
texts = re.sub(r"(?<=[a-z])\r?\n\n"," ", text)
print(texts)

Text that I want to obtain

text = "I want this \nwith one line break \n\nIn this place \nNothing happens \n\nHere\nnothing should\nhappen too."

With my code nothing changes in the text. My code follows this answered question: Regular expression to remove line breaks but I am unable to adapt it to two line breaks.

Upvotes: 0

Views: 121

Answers (1)

azro
azro

Reputation: 54148

It seems you need a positive lookahead , not a lookbehind, to have a lowercase char after

import re

expected = "I want this \nwith one line break \n\nIn this place \nNothing happens \n\nHere\nnothing should\nhappen too."
text     = "I want this \n\nwith one line break \n\nIn this place \nNothing happens \n\nHere\nnothing should\nhappen too."
texts = re.sub(r"\n\n(?=[a-z])", "\n", text)

print(texts == expected)

Upvotes: 1

Related Questions