Christian Wolf
Christian Wolf

Reputation: 1237

How to create regex search-and-replace with comments?

I have a bit of a strange problem: I have a code (it's LaTeX but that does not matter here) that contains long lines with period (sentences). For better version control I wanted to split these sentences on a new line each. This can be achieved via sed 's/\. /.\n/g'.

Now the problem arises if there are comments with potential periods as well. These comments must not be altered, otherwise they will be parsed as LaTeX code and this might result in errors etc.

As a pseudo example you can use

Foo. Bar. Baz. % A. comment. with periods.

The result should be

Foo.
Bar.
Baz. % ...

Alternatively the comment might go on the next line without any problems.

It was ok to use perl if that would work out better. I tried with different programs (sed and perl) a few ideas but none did what I expected. Either the comment was also altered or only the first period was altered (perl -pe 's/^([^%]*?)\. /\1.\n/g').

Can you point me in the right direction?

Upvotes: 3

Views: 71

Answers (2)

Jeff Y
Jeff Y

Reputation: 2456

Putting the comment by itself on a following line can be done with sed pretty easily, using the hold space:

sed '/^[^.]*%/b;/%/!{s/\. /.\n/g;b};h;s/[^%]*%/%/;x;s/ *%.*//;s/\. /.\n/g;G'

Or if you want the comment by itself before the rest:

sed '/^[^.]*%/b;/%/!{s/\. /.\n/g;b};h;s/ *%.*//;s/\. /.\n/g;x;s/[^%]*%/%/;G'

Or finally, it is possible to combine the comment with the last line also:

sed '/^[^.]*%/b;/%/!{s/\. /.\n/g;b};h;s/[^%]*%/%/;x;s/ *%.*//;s/\. /.\n/g;G;s/\n\([^\n]*\)$/ \1/'

Upvotes: 1

Michael Carman
Michael Carman

Reputation: 30841

This is tricky as you're essentially trying to match all occurrences of ". " that don't follow a "%". A negative look-behind would be useful here, but Perl doesn't support variable-width negative look-behind. (Though there are hideous ways of faking it in certain situations.) We can get by without it here using backtracking control verbs:

s/(?:%(*COMMIT)(*FAIL))|\.\K (?!%)/\n/g;

The (?:%(*COMMIT)(*FAIL)) forces replacement to stop the first time it sees a "%" by committing to a match and then unconditionally failing, which prevents back-tracking. The "real" match follows the alternation: \.\K (?!%) looks for a space that follows a period and isn't followed by a "%". The \K causes the period to not be included in the match so we don't have to include it in the replacement. We only match and replace the space.

Upvotes: 4

Related Questions