srb
srb

Reputation: 81

How to join newlines into a paragraph in python

I have some text that is in the following format

\r\n
1. \r\n
par1 par1 par1 \r\n
\r\n
par1 par1 par1 \r\n
\r\n
2. \r\n
\r\n 
par2 par2 par2

What I want to do is to join them into paragraphs so that the end result would be:

1. par1 par1 par1 par1 par1 par1 \n
2. par2 par2 par2 \n

I have tried with multiple string manipulations such as str.split(), str.strip() and others, as well as searchign the internet for solutions but nothing seems to work.

Is there any easy way to do this programatically? The text is very long so doing by hand is out of the question.

Upvotes: 2

Views: 1008

Answers (3)

Khanal
Khanal

Reputation: 788

Here is a slightly different approach using replace and re.

import re
# assuming d is the string you wanted to    parse 
d = """
\r\n
1. \r\n
par1 par1 par1 \r\n
\r\n
par1 par1 par1 \r\n
\r\n
2. \r\n
\r\n 
par2 par2 par2
"""

d = d.replace("\r", "").replace("\n", "")
d = re.sub(r'([0-9]+\.\s)\s*',r'\n\1', d).strip()
print(d)

Upvotes: 1

blhsing
blhsing

Reputation: 106553

Assuming your input text is stored in variable s, you can use the following generator expression with regex:

import re
print('\n'.join(re.sub(r'\s+', ' ', ''.join(t)).strip() for t in re.findall(r'^(\d+\.)(.*?)(?=^\d+\.|\Z)', s, flags=re.MULTILINE | re.DOTALL)))

This outputs:

1. par1 par1 par1 par1 par1 par1
2. par2 par2 par2

Upvotes: 2

Vineeth Sai
Vineeth Sai

Reputation: 3447

I've used regex to find out all the words in the string and rejoined them based on the type of element in list. Hope this helps.

import re

line1 = '''\r\n
1. \r\n
par1 par1 par1 \r\n
\r\n
par1 par1 par1 \r\n
\r\n
2. \r\n
\r\n 
par2 par2 par2'''

line2 = re.findall(r"[\w']+", line1)

op = ""

def isInt(item):
    try:
        int(item)
        return True
    except ValueError:
        return False

for item in line2:
    if isInt(item):
        op += "\n" + item + ". "

    else:
        op += item + " "

print(op)

O/P

1. par1 par1 par1 par1 par1 par1 
2. par2 par2 par2 

Be wary of the extra \n in front of 1.

Upvotes: 0

Related Questions