kyrenia
kyrenia

Reputation: 5565

How to ensure two line breaks between each paragraph in python

I am reading txt files into python, and want to get paragraph breaks consistent. Sometimes there is one, two, three, four... occasionally several tens or hundreds of blank lines between paragraphs.

It is obviously easy to strip out all the breaks, but I can only think of "botched" ways of making everything two breaks (i.e. a single blank line between each paragraph). All i can think of would be specifying multiple strips/replaces for different possible combinations of breaks... which gets unwieldy when the number of breaks is very large ... or iterativly removing excess breaks until left with two, which I guess would be slow and not particularly scalable to many tens of thousands of txt files ...

Is there a moderately fast to process [/simple] way of achieving this?

Upvotes: 0

Views: 72

Answers (2)

vks
vks

Reputation: 67968

import re
re.sub(r"([\r\n]){2,}",r"\1\1",x)

You can try this.Here x will be your string containing all the paragraphs.

Upvotes: 2

Praxeolitic
Praxeolitic

Reputation: 24079

Here's one way.

import os
f = open("text.txt")
r = f.read()
pars = [p for p in r.split(os.linesep) if p]
print (os.linesep * 2).join(pars)

This is assuming by paragraphs we mean a block of text not containing a linebreak.

Upvotes: 1

Related Questions