Mor Brb
Mor Brb

Reputation: 135

Python Regex - concatenating multiple lines based on a criteria

I have a text file and I want to remove all newline characters in between the adjacent lines where both have only 'capital letter' words/characters. So if one line is ABCD and the next line is AB, the result should be ABCD AB. I can do it with looping over the text line by line, but I need a more elegant way preferably with regex. Here is a text example:

ABCD  
AB
abcd ABB
cd
AB
ABC
ABCD
ab

and I want to get this:

ABCD AB
abcd ABB
cd
AB ABC ABCD
ab

I've written the following, but only works for two capital lines in a row and not more.

r = re.compile(r'(\n)([A-Z ]+)(\n)([A-Z ]+)(\n)')
text = r.sub(r'\1\2 \4\5',text)

Assume there are no other complexities than this (the text is clean already as the example is). I am a newbie struggling to learn regex! Thanks.

Upvotes: 1

Views: 66

Answers (1)

zx81
zx81

Reputation: 41838

See this demo:

Search: (?m)([A-Z ]+)[\r\n]+(?=[A-Z ]+$)

Replace: \1

  1. Note that we are inserting a space where you used to have a newline.

    result = re.sub(r"(?m)([A-Z ]+)[\r\n]+(?=[A-Z ]+$)", r"\1 ", subject)

Upvotes: 1

Related Questions