rapt
rapt

Reputation: 12240

Perl: regular expression: capturing group

In a code file, I want to remove any (one or more) consecutive white lines (lines that may include only zero or more spaces/tabs and then a newline) that go between a code text and the concluding } of a block. This concluding } may have spaces for indentation before it, so I want to keep them.

Here is what I try to do:

perl -i -0777 -pe 's/\s+\n([ ]*)\}/\n($1)\}/g' file

For example, if my code file looks like (□ is the space character):

□□□□while (true) {\n
□□□□□□□□print("Yay!");□□□□□□\n
□□□□□□□□□□□□□□□□\n
□□□□}\n

Then I want it to become:

□□□□while (true) {\n
□□□□□□□□print("Yay!");\n
□□□□}\n

However it does not do the change I expected. Any idea what I am doing wrong here?

Upvotes: 4

Views: 1572

Answers (4)

ysth
ysth

Reputation: 98508

perl -pi -0777 -e's/^\s*\n(?=\s*})//mg' yourfile

(Remove whitespace from the beginning of a line through a newline that precedes a line with } as the first non-whitespace.)

Upvotes: 1

Josh Withee
Josh Withee

Reputation: 11386

Try using this regex instead, which uses a positive look-ahead assertion. This way you only capture the part that you want to remove, and then replace it with nothing:

s/\s+(?=\n[ ]*\})//g

Upvotes: 0

David Collins
David Collins

Reputation: 3032

The only issues I can see with your regex are

  • you don't need the parenthesis around the matching variable, and
  • the use of a character class when extracting the match is redundant (unless you want to match tabs as well as spaces).

So, you could try

s/\s+\n( *)\}/\n$1\}/g

instead.

This works as expected when run on your test input.

To tidy it up even more, you could try the following.

s/\s+(\n *\})/$1/g

If there might be tabs as well as spaces, you can use a character class. (You do not need to include '|' inside the character class).

s/\s+(\n[ \t]*\})/$1/g

Upvotes: 1

mkHun
mkHun

Reputation: 5921

You can try the following one liner

perl -0777 -pe 's/\s*\n*(\s*\n)/$1/g' test

Upvotes: -1

Related Questions