shyamupa
shyamupa

Reputation: 1628

Regular expression to match lines which are preceded and succeeded by blank lines?

I have a file which looks like

This is a line which is a continuation from above .......

This is line I want to match ....

This is another line I want to match ....

This is yet another line I want to match ....

This is some regular text. Blah ...
Continuation of the regular text above ...

I want to "compact" lines preceded and succeeded by blank lines. Like this

This is line I want to match ....
This is another line I want to match ....
This is yet another line I want to match ....

This is some regular text. Blah ...
Continuation of the regular text above

I tried to match the lines which are preceded and succeeded by newline by using

re.findall(r'\n\n[\w ]+\n\n')

but that failed. Any suggestions?

Upvotes: 0

Views: 309

Answers (3)

Fleshgrinder
Fleshgrinder

Reputation: 16273

PCRE isn't available in Python, so you'd have to go with something like the following:

/(?=\r?\n|\x0b|\f|\r|\x85)(\r?\n|\x0b|\f|\r|\x85)(.+(\r?\n|\x0b|\f|\r|\x85))(?=\r?\n|\x0b|\f|\r|\x85)/g

Python Live Demo: http://regex101.com/r/xL8bF1 (Please see pcrepattern specification for the complex line feed stuff)

PCRE regular expression that should do what you want:

/(?=\R)\R(.+\R)(?=\R)/g

PCRE (PHP) Live Demo: http://regex101.com/r/aO8yA7

PS: Make use of the visualize whitespace feature over at regex101 for better understanding of the substitution result.

Upvotes: 4

mklement0
mklement0

Reputation: 439377

Building on @Fleshgrinder's excellent approach to perform the substitution desired:

re.sub(r'(?=\n)\n(.+)\n(?=\n)', r'\1\n', inputString)

If you also need to make it work with input that has \r\n line endings:

re.sub(r'(?=\r?\n)\r?\n(.+)(\r?\n)(?=\r?\n)', r'\1\2', inputString)

Assuming a Unix system and an input file named in.txt, you can test it from the command line as follows:

python -c \
  "import re,sys; print re.sub(r'(?=\n)\n(.+)\n(?=\n)', r'\1\n', sys.argv[1])" \
  "$(<in.txt)"

Upvotes: 1

divesh premdeep
divesh premdeep

Reputation: 1070

A simple solution using Perl (assuming the file in question is named "in.txt") -

perl -e 'undef $/; while ($file=<>) {$file=~s/\n\n(.*)(\n\n)/\n$1\n/g; print $file}' in.txt

Basically, read in the whole file as a single string in Perl and then apply the substitution function in Perl to the whole string.

(Note - I have assumed that this is a Unix system. You might want to add an extra optional check for carriage returns for Windows machines as per @Fleshgrinder 's answer.)

Upvotes: 0

Related Questions